Section: New Results


Participant : Gérard Biau.

Geometric inference

This line of research is in collaboration with the Geometrica project-team (INRIA Saclay). As the latter says:

Due to the fast evolution of data acquisition devices and computational power, scientists in many areas are demanding efficient algorithmic tools for analyzing, manipulating and visualizing more and more complex shapes or complex systems from approximating data. Many of the existing algorithmic solutions which come with little theoretical guarantees provide unsatisfactory and/or unpredictable results. Since these algorithms take as input discrete geometric data, it is mandatory to develop concepts that are rich enough to robustly and correctly approximate continuous shapes and their geometric properties by discrete models. Ensuring the correctness of geometric estimations and approximations on discrete data is a sensitive problem in many applications.

Thus, motivated by a broad range of potential applications in topological and geometric inference, we introduce in [15] a weighted version of the k–nearest neighbor density estimator. Various pointwise consistency results of this estimator are established; the proposed method is also implemented to recover level sets in both simulated and real-life data.

Another problem of geometric inference is the following one, studied in [16] . Principal curves are nonlinear generalizations of the notion of first principal component. Roughly, a principal curve is a parameterized curve in d that passes through the “middle” of a data cloud drawn from some unknown probability distribution. Depending on the definition, a principal curve relies on some unknown parameters (number of segments, length, turn...) which have to be properly chosen to recover the shape of the data without interpolating. In this paper, we consider the principal curve problem from an empirical risk minimization perspective and address the parameter selection issue using the point of view of model selection via penalization. We offer oracle inequalities and implement the proposed approaches to recover the hidden structures in both simulated and real-life data.

Statistical inference

We still keep an eye on more traditional mathematical statistics; in particular, the technical report [31] takes place within this field. It shows, for a large class of distributions and large samples, that estimates of the variance σ 2 and of the standard deviation σ are more often Pitman closer to their target than the corresponding shrinkage estimates which improve the mean squared error. The results thus indicate that Pitman closeness criterion, despite its controversial nature, should be regarded as a useful and complementary tool for the evaluation of estimates of σ 2 and of σ.