DATASHAPE - 2017 - Annual activity report

DATASHAPE

DATASHAPE - 2017

Project-Team Datashape

Personnel

Overall Objectives

Research Program

Application Domains

Main application domains

Highlights of the Year

New Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: New Results

Statistical aspects of topological and geometric data analysis

The DTM-signature for a geometric comparison of metric-measure spaces from samples

Participant : Claire Brécheteau.

In [43], we introduce the notion of DTM-signature, a measure on $ℝ_{+}$ that can be associated to any metric-measure space. This signature is based on the distance to a measure (DTM) introduced by Chazal, Cohen-Steiner and Mérigot. It leads to a pseudo-metric between metric-measure spaces, upper-bounded by the Gromov-Wasserstein distance. Under some geometric assumptions, we derive lower bounds for this pseudo-metric. Given two N-samples, we also build an asymptotic statistical test based on the DTM-signature, to reject the hypothesis of equality of the two underlying metric measure spaces, up to a measure-preserving isometry. We give strong theoretical justifications for this test and propose an algorithm for its implementation.

Estimating the Reach of a Manifold

Participants : Eddie Aamari, Frédéric Chazal, Bertrand Michel.

In collaboration with J. Kim, A. Rinaldo, L. Wasserman (Carnegie Mellon University)

Various problems of computational geometry and manifold learning encode geometric regularity through the so-called reach, a generalized convexity parameter. The reach $τ_{M}$ of a submanifold $M \subset ℝ^{D}$ is the maximal offset radius on which the projection onto $M$ is well defined. The quantity $τ_{M}$ renders a certain minimal scale of $M$ , giving bounds on both maximum curvature and possible bottleneck structures. In [35], we study the geometry of the reach through an approximation perspective. We derive new geometric results on the reach for submanifolds without boundary. An estimator $\hat{τ}$ of $τ_{M}$ is proposed in a framework where tangent spaces are known, and bounds assessing its efficiency are derived. In the case of i.i.d. random point cloud $𝕏_{n}$ , $\hat{τ} (𝕏_{n})$ is showed to achieve uniform expected loss bounds over a $𝒞^{3}$ -like model. Minimax upper and lower bounds are derived, and we conclude with the extension to a model with unknown tangent spaces.

Robust Topological Inference: Distance To a Measure and Kernel Distance

Participants : Frédéric Chazal, Bertrand Michel.

In collaboration with B. Fasy, F. Lecci, A. Rinaldo, L. Wasserman.

Let $P$ be a distribution with support $S$ . The salient features of $S$ can be quantified with persistent homology, which summarizes topological features of the sublevel sets of the distance function (the distance of any point $x$ to $S$ ). Given a sample from $P$ we can infer the persistent homology using an empirical version of the distance function. However, the empirical distance function is highly non-robust to noise and outliers. Even one outlier is deadly. The distance-to-a-measure (DTM) and the kernel distance are smooth functions that provide useful topological information but are robust to noise and outliers. In [17], we derive limiting distributions and confidence sets, and we propose a method for choosing tuning parameters.

Statistical analysis and parameter selection for Mapper

Participants : Steve Oudot, Bertrand Michel, Mathieu Carrière.

In [44] we study the question of the statistical convergence of the 1-dimensional Mapper to its continuous analogue, the Reeb graph. We show that the Mapper is an optimal estimator of the Reeb graph, which gives, as a byproduct, a method to automatically tune its parameters and compute confidence regions on its topological features, such as its loops and flares. This allows to circumvent the issue of testing a large grid of parameters and keeping the most stable ones in the brute-force setting, which is widely used in visualization, clustering and feature selection with the Mapper.

Sliced Wasserstein Kernel for Persistence Diagrams

Participants : Steve Oudot, Mathieu Carrière.

In collaboration with M. Cuturi (ENSAE)

Persistence diagrams (PDs) play a key role in topological data analysis (TDA), in which they are routinely used to describe succinctly complex topological properties of complicated shapes. PDs enjoy strong stability properties and have proven their utility in various learning contexts. They do not, however, live in a space naturally endowed with a Hilbert structure and are usually compared with specific distances, such as the bottleneck distance. To incorporate PDs in a learning pipeline, several kernels have been proposed for PDs with a strong emphasis on the stability of the RKHS distance w.r.t. perturbations of the PDs. In [27], we use the Sliced Wasserstein approximation of the Wasserstein distance to define a new kernel for PDs, which is not only provably stable but also provably discriminative w.r.t. the Wasserstein distance W1∞ between PDs. We also demonstrate its practicality, by developing an approximation technique to reduce kernel computation time, and show that our proposal compares favorably to existing kernels for PDs on several benchmarks.

An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists

Participants : Frédéric Chazal, Bertrand Michel.

Topological Data Analysis (TDA) is a recent and fast growing field providing a set of new topological and geometric tools to infer relevant features for possibly complex data. In [45], we propose a brief introduction, through a few selected recent and state-of-the-art topics, to basic fundamental and practical aspects of TDA for non experts.

Previous |

Home | Next next