## Section: New Results

### Statistical aspects of topological and geometric data analysis

#### Stability and Minimax Optimality of Tangential Delaunay Complexes for Manifold Reconstruction

Participant : Eddie Aamari.

In collaboration with C. Levrard (Univ. Paris Diderot).

we consider the problem of optimality in manifold reconstruction. A random sample ${\mathbb{X}}_{n}=\{{X}_{1},\dots ,{X}_{n}\}\subset {\mathbb{R}}^{D}$ composed of points lying on a d-dimensional submanifold $M$, with or without outliers drawn in the ambient space, is observed. Based on the tangential Delaunay complex, we construct an estimator$\widehat{M}$ that is ambient isotopic and Hausdorff-close to $M$ with high probability. $\widehat{M}$ is built from existing algorithms. In a model without outliers, we show that this estimator is asymptotically minimax optimal for the Hausdorff distance over a class of submanifolds with reach condition. Therefore, even with no a priori information on the tangent spaces of $M$, our estimator based on tangential Delaunay complexes is optimal. This shows that the optimal rate of convergence can be achieved through existing algorithms. A similar result is also derived in a model with outliers. A geometric interpolation result is derived, showing that the tangential Delaunay complex is stable with respect to noise and perturbations of the tangent spaces. In the process, a denoising procedure and a tangent space estimator both based on local principal component analysis (PCA) are studied [32].

#### Rates in the Central Limit Theorem and diffusion approximation via Stein's Method

Participant : Thomas Bonis.

We present a way to apply Stein's method in order to bound the Wasserstein distance between a, possibly discrete, measure and another measure assumed to be the invariant measure of a diffusion operator. We apply this construction to obtain convergence rates, in terms of $p$-Wasserstein distance for $p\ge 2$, in the Central Limit Theorem in dimension 1 under precise moment conditions. We also establish a similar result for the Wasserstein distance of order 2 in the multidimensional setting. In a second time, we study the convergence of stationary distributions of Markov chains in the context of diffusion approximation, with applications to density estimation from geometric random graphs and to sampling using the Langevin Monte Carlo algorithm [33].

#### Rates of Convergence for Robust Geometric Inference

Participants : Frédéric Chazal, Bertrand Michel.

In collaboration with P. Massart (Univ. Paris Sud et Inria Select team).

Distances to compact sets are widely used in the field of Topological Data Analysis for inferring geometric and topological features from point clouds. In this context, the distance to a probability measure (DTM) has been introduced by Chazal et al. as a robust alternative to the distance a compact set. In practice, the DTM can be estimated by its empirical counterpart, that is the distance to the empirical measure (DTEM). In this paper we give a tight control of the deviation of the DTEM. Our analysis relies on a local analysis of empirical processes. In particular, we show that the rate of convergence of the DTEM directly depends on the regularity at zero of a particular quantile function which contains some local information about the geometry of the support. This quantile function is the relevant quantity to describe precisely how difficult is a geometric inference problem. Several numerical experiments illustrate the convergence of the DTEM and also confirm that our bounds are tight [19].

#### Data driven estimation of Laplace-Beltrami operator

Participants : Frédéric Chazal, Bertrand Michel, Ilaria Giulini.

Approximations of Laplace-Beltrami operators on manifolds through graph Laplacians have become popular tools in data analysis and machine learning. These discretized operators usually depend on bandwidth parameters whose tuning remains a theoretical and practical problem. In this paper, we address this problem for the unnormalized graph Laplacian by establishing an oracle inequality that opens the door to a well-founded data-driven procedure for the bandwidth selection. Our approach relies on recent results by Lacour and Massart on the so-called Lepski’s method [26].