DATASHAPE

DATASHAPE - 2023

2023Activity reportProject-TeamDATASHAPE

RNSR: 201622050C

Research centers Inria Saclay Centre at Université Paris-Saclay Inria Centre at Université Côte d'Azur
In partnership with:Université Paris-Saclay, CNRS
Team name: Understanding the shape of data
In collaboration with:Laboratoire de mathématiques d'Orsay de l'Université de Paris-Sud (LMO)
Domain:Algorithmics, Programming, Software and Architecture
Theme:Algorithmics, Computer Algebra and Cryptology

Keywords

Computer Science and Digital Science

A3. Data and knowledge
A3.4. Machine learning and statistics
A7.1. Algorithms
A8. Mathematics of computing
A8.1. Discrete mathematics, combinatorics
A8.3. Geometry, Topology
A9. Artificial intelligence

1 Team members, visitors, external collaborators

Research Scientists

Frederic Chazal [Team leader, INRIA, Senior Researcher, HDR]
Charles Arnal [INRIA, Starting Research Position, from Nov 2023]
Jean-Daniel Boissonnat [INRIA, Emeritus, HDR]
Mathieu Carrière [INRIA, Researcher]
David Cohen-Steiner [INRIA, Researcher]
Marc Glisse [INRIA, Researcher]
Jisu Kim [INRIA, Starting Research Position, until Mar 2023]
Clément Maria [INRIA, Researcher]
Nina Otter [INRIA, ISFP, from Oct 2023]
Mathijs Wintraecken [INRIA, ISFP, from Feb 2023]

Faculty Members

Gilles Blanchard [UNIV PARIS SACLAY, Associate Professor, HDR]
Blanche Buet [UNIV PARIS SACLAY, Associate Professor]
Pierre Pansu [UNIV PARIS SACLAY, Associate Professor, HDR]

Post-Doctoral Fellows

Charles Arnal [INRIA, Post-Doctoral Fellow, until Oct 2023]
Solenne Gaucher [INRIA, Post-Doctoral Fellow, until Aug 2023]
Felix Hensel [INRIA, Post-Doctoral Fellow, until May 2023]

PhD Students

Charly Boricaud [UNIV PARIS SACLAY]
Jeremie Capitao-Miniconi [UNIV PARIS SACLAY]
Antoine Commaret [UNIV COTE D'AZUR]
Bastien Dussap [UNIV PARIS SACLAY]
Henrique Ennes [UNIV COTE D'AZUR, from Oct 2023]
Laure Ferraris [INRIA]
Georg, Alexander Gruetzner [STUDIENSTIFTUNG, until Mar 2023]
Alexandre, Guerin [Sysnav]
Olympio Hacquard [UNIV PARIS SACLAY, until Aug 2023]
Hugo Henneuse [UNIV PARIS SACLAY]
Vadim Lebovici [ENS PARIS, until Aug 2023]
David Loiseaux [INRIA]
Wojciech Reise [INRIA, until Nov 2023]
Christophe Vuong [TELECOM PARIS]

Technical Staff

Vincent Rouvreau [INRIA, Engineer]
Hannah Schreiber [INRIA, Engineer]

Interns and Apprentices

Hurtado Quiceno Andrea [INRIA, Intern, from May 2023 until Aug 2023]
Raphael De Maleprade [ENS PARIS-SACLAY, Intern, from Apr 2023 until Aug 2023]
Simon Delalande [INRIA, Intern, until Feb 2023]
Mohamed Hedi Derbel [INRIA, Intern, from Apr 2023 until Aug 2023]
Andrea Vanessa Hurtado Quiceno [INRIA, Intern, from May 2023 until Aug 2023]

Administrative Assistants

Aissatou-Sadio Diallo [INRIA]
Sophie Honnorat [INRIA]

Visiting Scientist

John Harvey [UNIV CARDIFF, until Jan 2023]

External Collaborators

Jisu Kim [LMO, from Apr 2023 until Aug 2023]
Bertrand Michel [CENTRALE NANTES]

2 Overall objectives

DataShape is a research project in Topological Data Analysis (TDA), a recent field whose aim is to uncover, understand and exploit the topological and geometric structure underlying complex and possibly high dimensional data. The overall objective of the DataShape project is to settle the mathematical, statistical and algorithmic foundations of TDA and to disseminate and promote our results in the data science community.

The approach of DataShape relies on the conviction that it is necessary to combine statistical, topological/geometric and computational approaches in a common framework, in order to face the challenges of TDA. Another conviction of DataShape is that TDA needs to be combined with other data science approaches and tools to lead to successful real applications. It is necessary for TDA challenges to be simultaneously addressed from the fundamental and applied sides.

The team members have actively contributed to the emergence of TDA during the last few years. The variety of expertise, going from fundamental mathematics to software development, and the strong interactions within our team as well as numerous well established international collaborations make our group one of the best to achieve these goals.

The expected output of DataShape is two-fold. First, we intend to set up and develop the mathematical, statistical and algorithmic foundations of Topological and Geometric Data Analysis. Second, we intend to pursue the development of the GUDHI platform, initiated by the team members and which is becoming a standard tool in TDA, in order to provide an efficient state-of-the-art toolbox for the understanding of the topology and geometry of data. The ultimate goal of DataShape is to develop and promote TDA as a new family of well-founded methods to uncover and exploit the geometry of data. This also includes the clarification of the position and complementarity of TDA with respect to other approaches and tools in data science. Our objective is also to provide practically efficient and flexible tools that could be used independently, complementarily or in combination with other classical data analysis and machine learning approaches.

3 Research program

3.1 Algorithmic aspects and new mathematical directions for topological and geometric data analysis

tda requires to construct and manipulate appropriate representations of complex and high dimensional shapes. A major difficulty comes from the fact that the complexity of data structures and algorithms used to approximate shapes rapidly grows as the dimensionality increases, which makes them intractable in high dimensions. We focus our research on simplicial complexes which offer a convenient representation of general shapes and generalize graphs and triangulations. Our work includes the study of simplicial complexes with good approximation properties and the design of compact data structures to represent them.

In low dimensions, effective shape reconstruction techniques exist that can provide precise geometric approximations very efficiently and under reasonable sampling conditions. Extending those techniques to higher dimensions as is required in the context of tda is problematic since almost all methods in low dimensions rely on the computation of a subdivision of the ambient space. A direct extension of those methods would immediately lead to algorithms whose complexities depend exponentially on the ambient dimension, which is prohibitive in most applications. A first direction to by-pass the curse of dimensionality is to develop algorithms whose complexities depend on the intrinsic dimension of the data (which most of the time is small although unknown) rather than on the dimension of the ambient space. Another direction is to resort to cruder approximations that only captures the homotopy type or the homology of the sampled shape. The recent theory of persistent homology provides a powerful and robust tool to study the homology of sampled spaces in a stable way.

3.2 Statistical aspects of topological and geometric data analysis

The wide variety of larger and larger available data - often corrupted by noise and outliers - requires to consider the statistical properties of their topological and geometric features and to propose new relevant statistical models for their study.

There exist various statistical and machine learning methods intending to uncover the geometric structure of data. Beyond manifold learning and dimensionality reduction approaches that generally do not allow to assert the relevance of the inferred topological and geometric features and are not well-suited for the analysis of complex topological structures, set estimation methods intend to estimate, from random samples, a set around which the data is concentrated. In these methods, that include support and manifold estimation, principal curves/manifolds and their various generalizations to name a few, the estimation problems are usually considered under losses, such as Hausdorff distance or symmetric difference, that are not sensitive to the topology of the estimated sets, preventing these tools to directly infer topological or geometric information.

Regarding purely topological features, the statistical estimation of homology or homotopy type of compact subsets of Euclidean spaces, has only been considered recently, most of the time under the quite restrictive assumption that the data are randomly sampled from smooth manifolds.

In a more general setting, with the emergence of new geometric inference tools based on the study of distance functions and algebraic topology tools such as persistent homology, computational topology has recently seen an important development offering a new set of methods to infer relevant topological and geometric features of data sampled in general metric spaces. The use of these tools remains widely heuristic and until recently there were only a few preliminary results establishing connections between geometric inference, persistent homology and statistics. However, this direction has attracted a lot of attention over the last three years. In particular, stability properties and new representations of persistent homology information have led to very promising results to which the DataShape members have significantly contributed. These preliminary results open many perspectives and research directions that need to be explored.

Our goal is to build on our first statistical results in tda to develop the mathematical foundations of Statistical Topological and Geometric Data Analysis. Combined with the other objectives, our ultimate goal is to provide a well-founded and effective statistical toolbox for the understanding of topology and geometry of data.

3.3 Topological and geometric approaches for machine learning

This objective is driven by the problems raised by the use of topological and geometric approaches in machine learning. The goal is both to use our techniques to better understand the role of topological and geometric structures in machine learning problems and to apply our tda tools to develop specialized topological approaches to be used in combination with other machine learning methods.

3.4 Experimental research and software development

We develop a high quality open source software platform called gudhi which is becoming a reference in geometric and topological data analysis in high dimensions. The goal is not to provide code tailored to the numerous potential applications but rather to provide the central data structures and algorithms that underlie applications in geometric and topological data analysis.

The development of the gudhi platform also serves to benchmark and optimize new algorithmic solutions resulting from our theoretical work. Such development necessitates a whole line of research on software architecture and interface design, heuristics and fine-tuning optimization, robustness and arithmetic issues, and visualization. We aim at providing a full programming environment following the same recipes that made up the success story of the cgal library, the reference library in computational geometry.

Some of the algorithms implemented on the platform will also be interfaced to other software platforms, such as the R software for statistical computing, and languages such as Python in order to make them usable in combination with other data analysis and machine learning tools. A first attempt in this direction has been done with the creation of an R package called TDA in collaboration with the group of Larry Wasserman at Carnegie Mellon University (Inria Associated team CATS) that already includes some functionalities of the gudhi library and implements some joint results between our team and the CMU team. A similar interface with the Python language is also considered a priority. To go even further towards helping users, we will provide utilities that perform the most common tasks without requiring any programming at all.

4 Application domains

Our work is mostly of a fundamental mathematical and algorithmic nature but finds a variety of applications in data analysis, e.g., in material science, biology, sensor networks, 3D shape analysis and processing, to name a few.

More specifically, DataShape is working on the analysis of trajectories obtained from inertial sensors (PhD theses of Wojtek Riese and Alexandre Guérin with Sysnav, participation to the DGA/ANR challenge MALIN with Sysnav) and, more generally on the development of new TDA methods for Machine Learning and Artificial Intelligence for (multivariate) time-dependent data from various kinds of sensors in collaboration with Fujitsu, or high dimensional point cloud data with Metafora.

DataShape is also working in collaboration with the University of Columbia in New-York, especially with the Rabadan lab, in order to improve bioinformatics methods and analyses for single cell genomic data. For instance, there is a lot of work whose aim is to use TDA tools such as persistent homology and the Mapper algorithm to characterize, quantify and study statistical significance of biological phenomena that occur in large scale single cell data sets. Such biological phenomena include, among others: the cell cycle, functional differentiation of stem cells, and immune system responses (such as the spatial response on the tissue location, and the genomic response with protein expression) to breast cancer.

5 Social and environmental responsibility

5.1 Footprint of research activities

The weekly research seminar of DataShape is now taking place in hybrid mode. The travels for the team members have decreased a lot these years to take care of the environmental footprint of the team.

6 Highlights of the year

6.1 Awards

Bastien Dussap obtained the best student paper award at ECML-PKDD as first author of 26.

6.2 Events

We organized a one week team workshop in May 2023, giving the opportunity to all the PhD students, post-doc and researchers of the team to present their work and discuss scientific questions all together. Some researchers, Simon Masnou (Université Lyon 1) and Rémy Leclercq (Université Paris-Saclay) were also invited to give mini-courses.

6.3 PhD defenses

Georg Gruetzner. Möbius spaces and large scale geometry. May 2023.
Olympio Hacquard. From topological features to machine learning models : a journey through persistence diagrams. September 2023.
Vadim Lebovici. Two complementary approaches in multi-parameter persistence : interval-decompositions and constructible functions. September 2023.
Wojciech Riese. Topological techniques for inference on periodic functions with phase variation. December 2023.
Christophe Vuong. Contributions à l’analyse stochastique pour structures sans propriété de diffusion. Décembre 2023.

7 New software, platforms, open data

7.1 New software

7.1.1 GUDHI

Name:
Geometric Understanding in Higher Dimensions
Keywords:
Computational geometry, Topology, Clustering
Scientific Description:

The Gudhi library is an open source library for Computational Topology and Topological Data Analysis (TDA). It offers state-of-the-art algorithms to construct various types of simplicial complexes, data structures to represent them, and algorithms to compute geometric approximations of shapes and persistent homology.

The GUDHI library offers the following interoperable modules:

. Complexes: + Cubical + Simplicial: Rips, Witness, Alpha and Čech complexes + Cover: Nerve and Graph induced complexes . Data structures and basic operations: + Simplex tree, Skeleton blockers and Toplex map + Construction, update, filtration and simplification . Topological descriptors computation . Manifold reconstruction . Topological descriptors tools: + Bottleneck and Wasserstein distance + Statistical tools + Persistence diagram and barcode
Functional Description:
The GUDHI open source library will provide the central data structures and algorithms that underly applications in geometry understanding in higher dimensions. It is intended to both help the development of new algorithmic solutions inside and outside the project, and to facilitate the transfer of results in applied fields.
News of the Year:

Below is a list of changes made since GUDHI 3.7.0 (december 2022):

Perslay a TensorFlow layer for persistence diagrams representations.

Cover Complex New classes to compute Mapper, Graph Induced complex and Nerves with a scikit-learn like interface.

Persistent cohomology New linear-time compute_persistence_of_function_on_line, also available though CubicalPersistence in Python.

Cubical complex Add possibility to build a lower-star filtration from vertices instead of top-dimensional cubes. Much faster implementation for the 2d case with input from top-dimensional cells.

Hera version of Wasserstein distance now provides matching in its interface.

Subsampling New choose_n_farthest_points_metric as a faster alternative of choose_n_farthest_points.

SimplexTree SimplexTree can now be used with python pickle. A helper for_each_simplex that applies a given function object on each simplex A new option link_nodes_by_label to speed up cofaces and stars access, when set to true. A new option stable_simplex_handles to keep Simplex handles valid even after insertions or removals, when set to true.

Čech complex A function assign_MEB_filtration that assigns to each simplex a filtration value equal to the squared radius of its minimal enclosing ball (MEB), given a simplicial complex and an embedding of its vertices. Applied on a Delaunay triangulation, it computes the Delaunay-Čech filtration.

Edge collapse A Python function reduce_graph to simplify a clique filtration (represented as a sparse weighted graph), while preserving its persistent homology.
URL:
https://gudhi.inria.fr/
Publication:
hal-01108461
Contact:
Marc Glisse
Participants:
Clément Maria, François Godi, David Salinas, Jean-Daniel Boissonnat, Marc Glisse, Mariette Yvinec, Pawel Dlotko, Siargey Kachanovich, Vincent Rouvreau, Mathieu Carrière, Clément Jamin, Siddharth Pritam, Frederic Chazal, Steve Oudot, Wojciech Reise, Hind Montassif, Hannah Schreiber, Martin Royer, David Loiseaux
Partners:
Université Côte d'Azur (UCA), Fujitsu

7.2 Open data

The TOPAL database of topological quantum invariants of knots and 3-manifolds. Contact: Clément MARIA

8 New results

8.1 Algorithmic aspects and new mathematical directions for topological and geometric data analysis

8.1.1 Fast persistent homology computation for functions on ℝ

Participant: Marc Glisse.

0-dimensional persistent homology is known, from a computational point of view, as the easy case. Indeed, given a list of $n$ edges in non-decreasing order of filtration value, one only needs a union-find data structure to keep track of the connected components and we get the persistence diagram in time $O (n α (n))$ . The running time is thus usually dominated by sorting the edges in $Θ (n log (n))$ . A little-known fact is that, in the particularly simple case of studying the sublevel sets of a piecewise-linear function on $ℝ$ or $𝕊^{1}$ , persistence can actually be computed in linear time. This note 49 presents a simple algorithm that achieves this complexity and an extension to image persistence. An implementation is available in Gudhi.

8.1.2 Hausdorff and Gromov-Hausdorff Stable Subsets of the Medial Axis

Participant: Mathijs Wintraecken.

In collaboration with André Lieutier.

In 27 we introduce a pruning of the medial axis called the $(λ, α)$ -medial axis ( ${ax}_{λ}^{α}$ ). We prove that the $(λ, α)$ -medial axis of a set $K$ is stable in a Gromov-Hausdorff sense under weak assumptions. More formally we prove that if $K$ and $K^{'}$ are close in the Hausdorff ( $d_{H}$ ) sense then the $(λ, α)$ -medial axes of $K$ and $K^{'}$ are close as metric spaces, that is the Gromov-Hausdorff distance ( $d_{G H}$ ) between the two is $\frac{1}{4}$ -Hölder in the sense that $d_{G H} ({ax}_{λ}^{α} (K), {ax}_{λ}^{α} (K^{'})) ≲ d_{H} {(K, K^{'})}^{1 / 4}$ . The Hausdorff distance between the two medial axes is also bounded, by $d_{H} ({ax}_{λ}^{α} (K), {ax}_{λ}^{α} (K^{'})) ≲ d_{H} {(K, K^{'})}^{1 / 2}$ . These quantified stability results provide guarantees for practical computations of medial axes from approximations. Moreover, they provide key ingredients for studying the computability of the medial axis in the context of computable analysis.

8.1.3 Tracing Isomanifolds in $ℝ^{d}$ in Time Polynomial in $d$ using Coxeter–Freudenthal–Kuhn Triangulations

Participant: Jean-Daniel Boissonnat, Siargey Kachanovich, Mathijs Wintraecken.

Isomanifolds are the generalization of isosurfaces to arbitrary dimension and codimension, i.e. submanifolds of $ℝ^{d}$ defined as the zero set of a multivariate multivalued smooth function $f : ℝ^{d} \to ℝ^{d - n}$ , where $n$ is the intrinsic dimension of the manifold and we assume that 0 is a regular value. A natural way to approximate a smooth isomanifold $ℳ = f^{- 1} (0)$ is to consider its Piecewise-Linear (PL) approximation $\hat{ℳ}$ based on a triangulation $𝒯$ of the ambient space $ℝ^{d}$ , whose longest edge has length $D$ . In 12, we describe a simple algorithm to trace isomanifolds from a given starting point on each connected component. The algorithm works for arbitrary dimensions $n$ and $d$ , and $D$ . Our main result is that, when $f$ (or $ℳ$ ) has bounded complexity, the complexity of the algorithm is polynomial in $d$ and $δ = 1 / D$ (and unavoidably exponential in $n$ ). Since it is known that for $δ = Ω (d^{2.5})$ , $\hat{ℳ}$ is $O (D^{2})$ -close and isotopic to $ℳ$ , our algorithm produces a faithful PL-approximation of isomanifolds of bounded complexity in time polynomial in $d$ . Combining this algorithm with dimensionality reduction techniques, the dependency on $d$ in the size of $\hat{ℳ}$ can be completely removed with high probability. We also show that the algorithm can handle isomanifolds with boundary and, more generally, isostratifolds. The algorithm for isomanifolds with boundary has been implemented and experimental results are reported, showing that it is practical and can handle cases that are far ahead of the state-of-the-art.

8.1.4 The reach of subsets of manifolds

Participant: Jean-Daniel Boissonnat, Mathijs Wintraecken.

Kleinjohann (Archiv der Mathematik 35(1):574–582, 1980; Mathematische Zeitschrift 176(3), 327–344, 1981) and Bangert (Archiv der Mathematik 38(1):54–57, 1982) extended the reach $rch (𝒮)$ from subsets $𝒮$ of Euclidean space to the reach ${rch}_{ℳ} (𝒮)$ of subsets $𝒮$ of Riemannian manifolds $ℳ$ , where is smooth (we'll assume at least $C^{3}$ ). Bangert showed that sets of positive reach in Euclidean space and Riemannian manifolds are very similar. In 13 we introduce a slight variant of Kleinjohann's and Bangert's extension and quantify the similarity between sets of positive reach in Euclidean space and Riemannian manifolds in a new way: Given and $p \in ℳ$ , we bound the local feature size (a local version of the reach) of its lifting to the tangent space via the inverse exponential map ( $\exp_{p}^{- 1} (𝒮$ ) at $q$ , assuming that ${rch}_{ℳ} (𝒮)$ and the geodesic distance $d_{ℳ} (p, q)$ are bounded. These bounds are motivated by the importance of the reach and local feature size to manifold learning, topological inference, and triangulating manifolds and the fact that intrinsic approaches circumvent the curse of dimensionality.

8.1.5 Local Criteria for Triangulating General Manifolds

Participant: Jean-Daniel Boissonnat, Mathijs Wintraecken.

In collaboration with Arijit Ghosh and Ramsay Dyer.

In 11, we present criteria for establishing a triangulation of a manifold. Given a manifold $M$ , a simplicial complex $A$ , and a map $H$ from the underlying space of $A$ to $M$ , our criteria are presented in local coordinate charts for $M$ , and ensure that $H$ is a homeomorphism. These criteria do not require a differentiable structure, or even an explicit metric on $M$ . No Delaunay property of $A$ is assumed. The result provides a triangulation guarantee for algorithms that construct a simplicial complex by working in local coordinate patches. Because the criteria are easily verified in such a setting, they are expected to be of general use.

8.1.6 Local characterizations for decomposability of 2-parameter persistence modules

Participant: Vadim Lebovici.

In collaboration with Steve Oudot (Inria) and M. B. Botnan (Vrije Universiteit)

In 14, we investigate the existence of sufficient local conditions under which poset representations decompose as direct sums of indecomposables from a given class. In our work, the indexing poset is the product of two totally ordered sets, corresponding to the setting of 2-parameter persistence in topological data analysis. Our indecomposables of interest belong to the so-called interval modules, which by definition are indicator representations of intervals in the poset. While the whole class of interval modules does not admit such a local characterization, we show that the subclass of rectangle modules does admit one and that it is, in some precise sense, the largest subclass to do so.

8.1.7 Discrete Morse Theory for Computing Zigzag Persistence

Participant: Clément Maria, Hannah Schreiber.

We introduce 23 a theoretical and computational framework to use discrete Morse theory as an efficient preprocessing in order to compute zigzag persistent homology. From a zigzag filtration of complexes $K_{i}$ , we introduce a zigzag Morse filtration whose complexes $A_{i}$ are Morse reductions of the original complexes $K_{i}$ , and we prove that they both have same persistent homology. The maps in the zigzag Morse filtration are forward and backward inclusions, as is standard in zigzag persistence, as well as a new type of map inducing non trivial changes in the boundary operator of the Morse complex. We study in details this last map, and design algorithms to compute the update both at the complex level and at the homology matrix level when computing zigzag persistence. We deduce an algorithm to compute the zigzag persistence of a filtration that depends mostly on the number of critical cells of the complexes, and show experimentally that it performs better in practice.

8.2 Statistical aspects of topological and geometric data analysis

8.2.1 On the persistent homology of almost surely $C^{0}$ stochastic processes

Participant: Daniel Perez.

This paper 24 investigates the properties of the persistence diagrams stemming from almost surely continuous random processes on $[0, t]$ . We focus our study on two variables which together characterize the barcode : the number of points of the persistence diagram inside a rectangle $] - \infty, x] \times [x + ε, \infty [$ , $N^{x, x + ε}$ and the number of bars of length $\geq ε$ , $N^{ε}$ . For processes with the strong Markov property, we show both of these variables admit a moment generating function and in particular moments of every order. Switching our attention to semimartingales, we show the asymptotic behaviour of $N^{ε}$ and $N^{x, x + ε}$ as $ε \to 0$ and of $N^{ε}$ as $ε \to \infty$ . Finally, we study the repercussions of the classical stability theorem of barcodes and illustrate our results with some examples, most notably Brownian motion and empirical functions converging to the Brownian bridge.

8.2.2 Topological signatures of periodic-like signals

Participant: Wojciech Riese, Frédéric Chazal.

In collaboration with Bertrand Michel (Ecole Centrale Nantes)

In 58, we present a method to construct signatures of periodic-like data. Based on topological considerations, our construction encodes information about the order and values of local extrema. Its main strength is robustness to reparametrisation of the observed signal, so that it depends only on the form of the periodic function. The signature converges as the observation contains increasingly many periods. We show that it can be estimated from the observation of a single time series using bootstrap techniques.

8.2.3 Heat diffusion distance processes: a statistically founded method to analyze graph data sets

Participant: Etienne Lasalle.

In 21, we propose two multiscale comparisons of graphs using heat diffusion, allowing to compare graphs without node correspondence or even with different sizes. These multiscale comparisons lead to the definition of Lipschitz-continuous empirical processes indexed by a real parameter. The statistical properties of empirical means of such processes are studied in the general case. Under mild assumptions, we prove a functional Central Limit Theorem, as well as a Gaussian approximation with a rate depending only on the sample size. Once applied to our processes, these results allow to analyze data sets of pairs of graphs. We design consistent confidence bands around empirical means and consistent two-sample tests, using bootstrap methods. Their performances are evaluated by simulations on synthetic data sets.

8.2.4 Support and distribution inference from noisy data

Participant: Jérémie Capitao-Miniconi.

In collaboration E. Gassiat (LMO, Univ. Paris-Saclay) and L Lehéricy (Univ. Côte d'Azur)

In 44, we consider noisy observations of a distribution with unknown support. In the deconvolution model, it has been proved recently that, under very mild assumptions, it is possible to solve the deconvolution problem without knowing the noise distribution and with no sample of the noise. We first give general settings where the theory applies and provide classes of supports that can be recovered in this context. We then exhibit classes of distributions over which we prove adaptive minimax rates (up to a log log factor) for the estimation of the support in Hausdorff distance. Moreover, for the class of distributions with compact support, we provide estimators of the unknown (in general singular) distribution and prove maximum rates in Wasserstein distance. We also prove an almost matching lower bound on the associated minimax risk.

8.2.5 A gradient sampling algorithm for stratified maps with applications to topological data analysis

Participant: Mathieu Carrière.

In collaboration J. Leygonie (Arda), T. Lacombe (Univ. Gustave Eiffel) and S. Oudot (Geomerix, Inria Saclay)

We introduce a novel gradient descent algorithm extending the well-known Gradient Sampling methodology to the class of stratifiably smooth objective functions, which are defined as locally Lipschitz functions that are smooth on some regular pieces-called the strata-of the ambient Euclidean space. For this class of functions, our algorithm achieves a sub-linear convergence rate. We then apply our method to objective functions based on the (extended) persistent homology map computed over lower-star filters, which is a central tool of Topological Data Analysis. For this, we propose an efficient exploration of the corresponding stratification by using the Cayley graph of the permutation group. Finally, we provide benchmark and novel topological optimization problems, in order to demonstrate the utility and applicability of our framework.

8.3 Topological and geometric approaches for machine learning

8.3.1 Choosing the parameter of the Fermat distance: navigating geometry and noise.

Participant: Frédéric Chazal, Laure Ferraris.

In collaboration with P. Groisman, M. Jonckheere, F. Pascal and F. Sapienza

The Fermat distance has been recently established as a useful tool for machine learning tasks when a natural distance is not directly available to the practitioner or to improve the results given by Euclidean distances by exploding the geometrical and statistical properties of the dataset. This distance depends on a parameter $α$ that greatly impacts the performance of subsequent tasks. Ideally, the value of $α$ should be large enough to navigate the geometric intricacies inherent to the problem. At the same, it should remain restrained enough to sidestep any deleterious ramifications stemming from noise during the process of distance estimation. In 46, we study both theoretically and through simulations how to select this parameter.

8.3.2 MAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks

Participant: Felix Hensel, Charles Arnal, Mathieu Carrière, Frédéric Chazal.

In collaboration with T. Lacombe (Univ. G. Eiffel), H. Kurihara (Fujitsu), Y. Ike (Kyushu Univ.)

Despite their successful application to a variety of tasks, neural networks remain limited, like other machine learning methods, by their sensitivity to shifts in the data: their performance can be severely impacted by differences in distribution between the data on which they were trained and that on which they are deployed. In 54, we propose a new family of representations, called MAGDiff, that we extract from any given neural network classifier and that allows for efficient covariate data shift detection without the need to train a new model dedicated to this task. These representations are computed by comparing the activation graphs of the neural network for samples belonging to the training distribution and to the target distribution, and yield powerful data- and task-adapted statistics for the two-sample tests commonly used for data set shift detection. We demonstrate this empirically by measuring the statistical powers of two-sample Kolmogorov-Smirnov (KS) tests on several different data sets and shift types, and showing that our novel representations induce significant improvements over a state-of-the-art baseline relying on the network output.

8.3.3 Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching

Participant: Bastien Dussap, Gilles Blanchard.

In collaboration with Badr-Edine Chérief-Abdellatif (CNRS, LPSM, U. Paris-Sorbonne)

Quantification learning deals with the task of estimating the target label distribution under label shift. In 26, we established a unifying framework, distribution feature matching (DFM), that recovers as particular instances various estimators introduced in previous literature. We derived a general performance bound for DFM procedures, improving in several key aspects upon previous bounds derived in particular cases. We then extended this analysis to study robustness of DFM procedures in the misspecified setting under departure from the exact label shift hypothesis, in particular in the case of contamination of the target by an unknown distribution. These theoretical findings were confirmed by a detailed numerical study on simulated and real-world datasets. We also introduced an efficient, scalable and robust version of kernel-based DFM using the Random Fourier Feature principle.

This paper received the "Best student paper" award at the ECML/PKDD conference 2023.

8.3.4 Post hoc false discovery proportion inference under a Hidden Markov Model

Participant: Gilles Blanchard.

In collaboration with Marie Perrot-Dockès (MAP5, U. Paris), Étienne Roquain (LPSM, U. Paris-Sorbonne), Pierre Neuvial (CNRS, U. Toulouse)

We addressed in 25 the multiple testing problem under the assumption that the true/false hypotheses are driven by a Hidden Markov Model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of Sun and Cai (2009). While previous work has concentrated on deriving specific procedures with a controlled False Discovery Rate (FDR) under this model, following a recent trend in selective inference, we considered the problem of establishing confidence bounds on the false discovery proportion (FDP), for a user-selected set of hypotheses that can depend on the observed data in an arbitrary way. We developed a methodology to construct such confidence bounds first when the HMM model is known, then when its parameters are unknown and estimated, including the data distribution under the null and the alternative, using a nonparametric approach. In the latter case, we proposed a bootstrap-based methodology to take into account the effect of parameter estimation error. We showed that taking advantage of the assumed HMM structure allows for a substantial improvement of confidence bound sharpness over existing agnostic (structure-free) methods, as witnessed both via numerical experiments and real data examples.

8.3.5 Stable vectorization of multiparameter persistent homology using signed barcodes as measures

Participant: Mathieu Carrière, David Loiseaux.

In collaboration with Luis Scoccola (Oxford University), Steve Oudot (Geomerix, Inria Saclay) and Magnus Botnan (Vrije Universiteit)

Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case – where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest – and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes – a recent family of MPH descriptors – as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.

8.3.6 A framework for fast and stable representations of multiparameter persistent homology decompositions

Participant: Mathieu Carrière, David Loiseaux.

In collaboration with Andrew Blumberg (Columbia University, NYC)

Topological data analysis (TDA) is an area of data science that focuses on using invariants from algebraic topology to provide multiscale shape descriptors for geometric data sets such as point clouds. One of the most important such descriptors is persistent homology, which encodes the change in shape as a filtration parameter changes; a typical parameter is the feature scale. For many data sets, it is useful to simultaneously vary multiple filtration parameters, for example feature scale and density. While the theoretical properties of single parameter persistent homology are well understood, less is known about the multiparameter case. In particular, a central question is the problem of representing multiparameter persistent homology by elements of a vector space for integration with standard machine learning algorithms. Existing approaches to this problem either ignore most of the multiparameter information to reduce to the one-parameter case or are heuristic and potentially unstable in the face of noise. In this article, we introduce a new general representation framework that leverages recent results on decompositions of multiparameter persistent homology. This framework is rich in information, fast to compute, and encompasses previous approaches. Moreover, we establish theoretical stability guarantees under this framework as well as efficient algorithms for practical computation, making this framework an applicable and versatile tool for analyzing geometric and point cloud data. We validate our stability results and algorithms with numerical experiments that demonstrate statistical convergence, prediction accuracy, and fast running times on several real data sets.

8.4 Algorithmic and Combinatorial Aspects of Low Dimensional Topology

8.4.1 An algorithm for Tambara-Yamagami quantum invariants of 3-manifolds, parameterized by the first Betti number

Participant: Clément Maria.

In collaboration with Colleen Delaney (Berkeley), Eric Samperton (Purdue).

Quantum topology provides various frameworks for defining and computing invariants of manifolds. One such framework of substantial interest in both mathematics and physics is the Turaev-Viro-Barrett-Westbury state sum construction, which uses the data of a spherical fusion category to define topological invariants of triangulated 3-manifolds via tensor network contractions. In this work 47 we consider a restricted class of state sum invariants of 3-manifolds derived from Tambara-Yamagami categories. These categories are particularly simple, being entirely specified by three pieces of data: a finite abelian group, a bicharacter of that group, and a sign $\pm 1$ . Despite being one of the simplest sources of state sum invariants, the computational complexities of Tambara-Yamagami invariants are yet to be fully understood. We make substantial progress on this problem. Our main result is the existence of a general fixed parameter tractable algorithm for all such topological invariants, where the parameter is the first Betti number of the 3-manifold with $ℤ / 2 ℤ$ coefficients. We also explain that these invariants are sometimes #P-hard to compute (and we expect that this is almost always the case). Contrary to other domains of computational topology, such as graphs on surfaces, very few hard problems in 3-manifold topology are known to admit FPT algorithms with a topological parameter. However, such algorithms are of particular interest as their complexity depends only polynomially on the combinatorial representation of the input, regardless of size or combinatorial width. Additionally, in the case of Betti numbers, the parameter itself is easily computable in polynomial time.

8.4.2 Hard Diagrams of the Unknot

Participant: Clément Maria.

In collaboration with Benjamin Burton (University of Queensland), Hsien-Chih Chang (Dartmouth College), Maarten Löffler (TU/e - Eindhoven University of Technology), Arnaud de Mesmay (LIGM - Laboratoire d'Informatique Gaspard-Monge), Saul Schleimer (University of Warwick), Eric Sedgwick (DePaul University), Jonathan Spreer (The University of Sydney).

We present three “hard” diagrams of the unknot 15. They require (at least) three extra crossings before they can be simplified to the trivial unknot diagram via Reidemeister moves in S2. Both examples are constructed by applying previously proposed methods. The proof of their hardness uses significant computational resources. We also determine that no small “standard” example of a hard unknot diagram requires more than one extra crossing for Reidemeister moves in S2.

8.5 Miscellaneous

8.5.1 Variational Shape Reconstruction via Quadric Error Metrics

Participant: David Cohen-Steiner.

In collaboration with Tong Zhao, Pierre Alliez (Inria team Titane), Laurent Busé (Inria team Aromath), Tamy Boubekeur and Jean-Marc Thiery (Adobe Research)

Inspired by the strengths of quadric error metrics initially designed for mesh decimation, we propose a concise mesh reconstruction approach for 3D point clouds 32. Our approach proceeds by clustering the input points enriched with quadric error metrics, where the generator of each cluster is the optimal 3D point for the sum of its quadric error metrics. This approach favors the placement of generators on sharp features, and tends to equidistribute the error among clusters. We reconstruct the output surface mesh from the adjacency between clusters and a constrained binary solver. We combine our clustering process with an adaptive refinement driven by the error. Compared to prior art, our method avoids dense reconstruction prior to simplification and produces immediately an optimized mesh.

8.5.2 Two Lower Bounds for Random Point Sets via Negative Association

Participant: Marc Glisse.

In collaboration with Denys Bulavka (Charles University, Prague), Olivier Devillers (Inria team Gamble), Philippe Duchon (Laboratoire Bordelais de Recherche en Informatique) and Xavier Goaoc (Inria team Gamble)

We present 43 two lower bounds that hold with high probability for random point sets. We first give a new, and elementary, proof that the classical models of random point sets (uniform in a smooth convex body, uniform in a polygon, Gaussian) have a superconstant number of extreme points with high probability. We next prove that any algorithm that determines the orientation of all triples in a planar set of n points (that is, the order type of the point set) from their Cartesian coordinates must read with high probability $4 n log n - O (n log log n)$ coordinate bits. This matches previously known upper bounds. Both bounds rely on a method due to Dubhashi and Ranjan (Random Structures and Algorithms, 1998) for obtaining concentration results via a negative association property.

8.5.3 Harmonic analysis on the boundary of hyperbolic groups

Participant: Georg Gruetzner.

In this paper 51 we show that a Möbius-structure $ℳ$ of dimension $Q$ has a minimal Ahlfors-David constant. This shows that a Möbius space is uniformly $Q$ -Ahlfors-David regular. In summary, many classical theorems of harmonic analysis on $ℝ^{n}$ admit a Möbius-invariant formulation in the context of Möbius-geometry. We use this observation to show that the Knapp-Stein operator

(I_{d}^{α} u_{d}) (x) = \int \frac{u_{d} (y)}{d {(x, y)}^{Q - α}} d μ_{d} (y), (0 < α < \frac{Q}{2})

is a continuous operator on the weighted $L^{2}$ -space $L^{2} ({(\frac{d^{'}}{d})}^{α} d μ_{d})$ , with a norm independent of $d$ and $d^{'}$ .

From here we construct a Sobolev space $ℋ_{d}^{- α}$ on $s$ -densities for a given $s$ as a function of $α$ . We would like to say that the construction is topologically independent of the metric $d$ . In this paper we prove that the norms on a large class of functions are comparable.

The work is inspired by a paper by Astengo, Cowling, and Di Blasio, who construct uniformly bounded representations for simple Lie groups of rank 1. We formulate the problem in a much more general framework of groups acting on Möbius structures. In particular, all hyperbolic groups.

8.5.4 Constant regret for sequence prediction with limited advice

Participant: Gilles Blanchard.

In collaboration with El Mehdi Saad (CentraleSupelec, U. Paris-Saclay)

We investigated in 30 the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size $K$ under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most $p$ experts for prediction, then they can observe a posteriori the losses of at most $m$ experts. We assume that the loss function is range-bounded and exp-concave. In the standard multi-armed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order $𝒪 (\sqrt{K T})$ . We show that allowing the learner to play one additional expert per round and observe one additional feedback improves substantially the guarantees on regret. We provide a strategy combining only $p = 2$ experts per round for prediction and observing $m \geq 2$ experts' losses. Its randomized regret (wrt. internal randomization of the learners' strategy) is of order $𝒪 ((K / m) log (K δ^{- 1}))$ with probability $1 - δ$ , i.e., is independent of the horizon $T$ (“constant” or “fast rate” regret) if ( $p \geq 2$ and $m \geq 3$ ). We prove that this rate is optimal up to a logarithmic factor in $K$ . In the case $p = m = 2$ , we provide an upper bound of order $𝒪 (K^{2} log (K δ^{- 1}))$ , with probability $1 - δ$ . Our strategies do not require any prior knowledge of the horizon $T$ nor of the confidence parameter $δ$ . Finally, we show that if the learner is constrained to observe only one expert feedback per round, the worst-case regret is the “slow rate” $Ω (\sqrt{K T})$ , suggesting that synchronous observation of at least two experts per round is necessary to have a constant regret.

8.5.5 Flagfolds

Participant: Blanche Buet.

In collaboration with Xavier Pennec (INRIA Team Epione, Sophia Antipolis)

In 42, by interpreting the product of the Principal Component Analysis, that is the covariance matrix, as a sequence of nested subspaces naturally coming with weights according to the level of approximation they provide, we are able to embed all $d$ –dimensional Grassmannians into a stratified space of covariance matrices. We observe that Grassmannians constitute the lowest dimensional skeleton of the stratification while it is possible to define a Riemaniann metric on the highest dimensional and dense stratum, such a metric being compatible with the global stratification. With such a Riemaniann metric at hand, it is possible to look for geodesics between two linear subspaces of different dimensions that do not go through higher dimensional linear subspaces as would euclidean geodesics. Building upon the proposed embedding of Grassmannians into the stratified space of covariance matrices, we generalize the concept of varifolds to what we call flagfolds in order to model multi-dimensional shapes.

9 Bilateral contracts and grants with industry

9.1 Bilateral contracts with industry

Participants: Alexandre Guerin, Frédéric Chazal.

Collaboration with Sysnav, a French SME with world leading expertise in navigation and geopositioning in extreme environments, on TDA, geometric approaches and machine learning for the analysis of movements of pedestrians and patients equipped with inertial sensors (CIFRE PhD of Alexandre Guérin).
Participants: Felix Hensel, Theo Lacombe, Marc Glisse, Mathieu Carrière, Frédéric Chazal.

Research collaboration with Fujitsu on the development of new TDA methods and tools for Machine learning and Artificial Intelligence (started in Dec 2017).
Participants: Bastien Dussap, Marc Glisse, Gilles Blanchard.

Research collaboration with MetaFora on the development of new TDA-based and statistical methods for the analysis of cytometric data (started in Nov. 2019).
Participants: David Cohen-Steiner.

Collaboration with Dassault Systèmes and Inria team Geomerix (Saclay) on the applications of methods from geometric measure theory to the modelling and processing of complex 3D shapes (PhD of Lucas Brifault, started in May 2022).

10 Partnerships and cooperations

10.1 International research visitors

10.1.1 Visits of international scientists

Other international visits to the team

Wolfgang Polonik

Status
researcher
Institution of origin:
UC Davis
Country:
USA
Dates:
December 2023
Context of the visit:
research stay, PhD jury
Mobility program/type of mobility:
research stay

10.1.2 Visits to international teams

Sabbatical programme

Clément Maria

Visited institution:
Escola de Matemática Aplicada de la Fondation Getúlio Vargas (Brésil)
Dates of the stay:
From Sat Oct 01 2022 to Sat Sep 30 2023
Summary of the stay:
This is a one year sabbatical stay whose main goal is to set up and reinforce new long term collaborations in computational and applied topology and topological data analysis.

Research stays abroad

Gilles Blanchard

Visited institution:
University of Potsdam (Germany)
Dates of the stay:
01/01/2023 to 31/07/2023
Summary of the stay:
This was a 7-month research stay for collaboration with the Collaborative Research Center "Data Assumilation" of the University of Potsdam, Speaker: Prof. Sebastian Reich

10.2 National initiatives

10.2.1 ANR

ANR Chair in AI

Participants: Frédéric Chazal, Marc Glisse, Louis Pujol, Wojciech Riese.

- Acronym : TopAI

- Type : ANR Chair in AI.

- Title : Topological Data Analysis for Machine Learning and AI

- Coordinator : Frédéric Chazal

- Duration : 4 years from September 2020 to August 2024.

- Others Partners: Two industrial partners, the French SME Sysnav and the French start-up MetaFora.

- Abstract:

The TopAI project aims at developing a world-leading research activity on topological and geometric approaches in Machine Learning (ML) and AI with a double academic and industrial/societal objective. First, building on the strong expertise of the candidate and his team in TDA, TopAI aims at designing new mathematically well-founded topological and geometric methods and tools for Data Analysis and ML and to make them available to the data science and AI community through state-of-the-art software tools. Second, thanks to already established close collaborations and the strong involvement of French industrial partners, TopAI aims at exploiting its expertise and tools to address a set of challenging problems with high societal and economic impact in personalized medicine and AI-assisted medical diagnosis.

ANR ALGOKNOT

Participants: Clément Maria.

- Acronym : ALGOKNOT.

- Type : ANR Jeune Chercheuse Jeune Chercheur.

- Title : Algorithmic and Combinatorial Aspects of Knot Theory.

- Coordinator : Clément Maria.

- Duration : 2020 – 2025 (5 years).

- Abstract: The project AlgoKnot aims at strengthening our understanding of the computational and combinatorial complexity of the diverse facets of knot theory, as well as designing efficient algorithms and software to study their interconnections.

- See also: Clément Maria and ANR AlgoKnot.

ANR GeMfaceT

Participants: Blanche Buet.

- Acronym: GeMfaceT.

- Type: ANR JCJC -CES 40 – Mathématiques

- Title: A bridge between Geometric Measure and Discrete Surface Theories

- Coordinator: Blanche Buet.

- Duration: 48 months, starting October 2021.

- Abstract: This project positions at the interface between geometric measure and discrete surface theories. There has recently been a growing interest in non-smooth structures, both from theoretical point of view, where singularities occur in famous optimization problems such as Plateau problem or geometric flows such as mean curvature flow, and applied point of view where complex high dimensional data are no longer assumed to lie on a smooth manifold but are more singular and allow crossings, tree-structures and dimension variations. We propose in this project to strengthen and expand the use of geometric measure concepts in discrete surface study and complex data modelling and also, to use those possible singular disrcete surfaces to compute numerical solutions to the aforementioned problems.

10.2.2 Collaboration with other national research institutes

IFPEN

Participants: Frédéric Chazal, Marc Glisse, Jisu Kim.

Research collaboration between DataShape and IFPEN on TDA applied to various problems issued from energy transition and sustainable mobility.

Confiance.ai / IRT SystemX

Participants: Frédéric Chazal.

Research collaboration on anomaly detection for multivariate time series using TDA and ML approaches.

10.3 Regional initiatives

Metafora

Participant: Gilles Blanchard, Bastien Dussap, Marc Glisse.

- Type : Paris Region PhD 2021.

- Title : Comparaison de données cytométriques.

The Île-de-France region funds a PhD thesis in collaboration with Metafora biosystems, a company specialized in the analysis of cells through their metabolism. Bastien Dussap is supervised by Gilles Blanchard and Marc Glisse and aims to compare blood samples using statistics.

11 Dissemination

11.1 Promoting scientific activities

11.1.1 Scientific events: organisation

Member of the organizing committees

Blanche Buet is co-organizing the Harmonic Analysis seminar of the ANH team in Orsay.

11.1.2 Journal

Member of the editorial boards

Gilles Blanchard was member of the following journal editorial boards: Annals of Statistics, Electronic Journal of Statistics, Bernoulli.
Frédéric Chazal is a member of the following journal editorial boards: Discrete and Computational Geometry (Springer), Graphical Models (Elsevier).
Frédéric Chazal is the Editor-in-Chief of the Journal of Applied and Computational Topology (Springer).
Clément Maria is co-editor of the CGTA Special Issue on algorithmic aspects of computational and applied topology.

11.1.3 Invited talks

Blanche Buet gave talks in several team seminars: Math department of Mulhouse, Analyse/EDP seminar of UCLouvain (Belgium), séminaire du pôle analyse du CMAP (Palaiseau) and conferences: Paris-London Analysis Seminar (UCL London, UK), Approximation Theory Workshop in FoCM 2023 (Paris, France), Off-the-grid workshop (IHP, France).
Blanche Buet gave a 3h–short course on varifolds in Aussois (France).

11.1.4 Leadership within the scientific community

Frédéric Chazal is the Scientific Director of the DATAIA Institute at Université Paris-Saclay.
Frédéric Chazal is a member of the board of directors of the DIM project AI4IDF of the Région Ile-de-France.
Clément Maria is co-head (with Théo Lacombe) of the GT GeoAlgo within the GdR IM.
Clément Maria represents INRIA in the Steering Committee of the QuantAzur Federative Institute on quantum technologies.

11.1.5 Research administration

Pierre Pansu was deputy director of the FMJH until August.
Marc Glisse is president of the CDT at Inria Saclay.
Blanche Buet is member of the CCUPS (Commission Consultative de l'Université Paris-Saclay), Laboratory council and "comité parité" of LMO.
Blanche Buet is member of the FMJH postdoc selection committee.

11.2 Teaching - Supervision - Juries

11.2.1 Teaching

Master: Frédéric Chazal, Analyse Topologique des Données, 30h eq-TD, Université Paris-Sud, France.
Master: Clément Maria, Computational Geometry Learning, 18h eq-TD, M2, MPRI, France.
Master: Frédéric Cazals and Mathieu Carrière, Foundations of Geometric Methods in Data Analysis, 24h eq-TD, M2, École CentraleSupélec, France.
Master: Frédéric Cazals and Jean-Daniel Boissonnat and Mathieu Carrière, Geometric and Topological Methods in Machine Learning, 24h eq-TD, M2, Université Côte d'Azur, France.
Master: Frédéric Chazal and Julien Tierny, Topological Data Analysis, 38h eq-TD, M2, Mathématiques, Vision, Apprentissage (MVA), ENS Paris-Saclay, France.
Master: Gilles Blanchard, Mathematics for Artificial Intelligence 1, 70h eq-TD, IMO, Université Paris-Saclay, France.
Master: Blanche Buet, TD-Distributions et analyse de Fourier, 60h eq-TD, M1, Université Paris-Saclay, France.
Master: Marc Glisse, Conception et analyse d'algorithmes, 40h eq-TD, M1, École Polytechnique, France.
Master: Blanche Buet, Short Course Géométrie et approximation at the Rentrée des Masters 2023, Université Paris-Saclay.
Undergrad: Marc Glisse, Mécanismes de la programmation orientée-objet, 40h eq-TD, L3, École Polytechnique, France.
Master: Nina Otter, Probabilités, 25h eq-TD, M1 Mathématiques Fondamentales, Laboratoire de Mathématiques d'Orsay, France.

11.2.2 Supervision

PhD: Vadim Lebovici, Laplace transform for constructible functions. Defended in September 2023. Steve Oudot and François Petit.
PhD: Christophe Vuong, Random hypergraphs. Defended in December 2023. Laurent Decreusefond and Marc Glisse.
PhD in progress: Bastien Dussap, Comparaison de données cytométriques, started October 1st, 2021, Gilles Blanchard and Marc Glisse.
PhD: Olympio Hacquard, Apprentissage statistique par méthodes topologiques et géométriques, defended in September 2023, Gilles Blanchard and Clément Levrard.
PhD in progress: Hannah Marienwald, Transfer learning in high dimension. Started September 2019. Gilles Blanchard and Klaus-Robert Müller.
PhD in progress: Jean-Baptiste Fermanian, Estimation de Kernel Mean Embedding et tests multiples en grande dimension. Started September 2021. Gilles Blanchard and Magalie Fromont-Renoir.
PhD in progress: Antoine Commaret, Persistent Geometry. Started September 2021. David Cohen-Steiner and Indira Chatterji.
PhD in progress: Lucas Brifault, Théorie de la mesure géométrique appliquée pour la modélisation de formes complexes. Started May 2022. David Cohen-Steiner and Mathieu Desbrun.
PhD in progress: David Loiseaux, Multivariate topological data analysis for statistical machine learning. Started November 2021. Mathieu Carrière and Frédéric Cazals.
PhD: Wojciech Reise, TDA for curve data. Defended in December 2023. Frédéric Chazal and Bertrand Michel.
PhD in progress: Alexandre Guérin, Movement analysis from inertial sensors. Started on October 2021. Frédéric Chazal and Bertrand Michel.
PhD in progress: Jérémie Capitao-Miniconi, deconvolution for singular measures with geometric support. Started on October 2020. Frédéric Chazal and Elisabeth Gassiat.
PhD in progress: Charly Boricaud, Geometric inference for Data analysis: a Geometric Measure Theory perspective. Started on October 2021. Blanche Buet, Gian Paolo Leonardi et Simon Masnou.
PhD in progress: Hugo Henneuse. Statistical Foundations of Topological Data Analysis for multidimensional random fields. Started on October 2022. Frédéric Chazal and Pascal Massart.
PhD in progress: Laure Ferraris. Measure-dependent metric learning and applications in Topological Data Analysis. Started on October 2022. Frédéric Chazal.
PhD: Georg Grützner. Espaces de Möbius et géométrie à grande échelle. Defended in May 2023. Pierre Pansu.
PhD: Henrique Ennes. Computational Complexity Foundations of Quantum Topology. Started on October 2023. Clément Maria and Nicolas Nisse (INRIA).

11.2.3 Juries

PhD defense jury: Nina Otter, PhD defense of Pepijn Roos Hoefgeest (December 2023, Free University Amsterdam, supervisor: Magnus Botnan)
PhD defense jury: Blanche Buet, PhD defense of Elise Bonhomme (December 2023, LMO, supervisor: François Babadjian)

11.3 Popularization

11.3.1 Interventions

Clément Maria, c@fé-in INRIA-UCA, December 2023

12 Scientific production

12.1 Major publications

1 articleD.Dominique Attali, U.Ulrich Bauer, O.Olivier Devillers, M.Marc Glisse and A.André Lieutier. Homological Reconstruction and Simplification in R3.Computational Geometry2014HAL DOI
2 articleJ.-D.Jean-Daniel Boissonnat, R.Ramsay Dyer and A.Arijit Ghosh. Delaunay Triangulation of Manifolds.Foundations of Computational Mathematics452017, 38HAL DOI
3 articleJ.-D.Jean-Daniel Boissonnat, R.Ramsay Dyer, A.Arijit Ghosh and S. Y.Steve Y. Oudot. Only distances are required to reconstruct submanifolds.Computational Geometry662017, 32 - 67HAL DOI
4 articleJ.-D.Jean-Daniel Boissonnat, K. C.Karthik C. Srikanta and S.Sébastien Tavenas. Building Efficient and Compact Data Structures for Simplicial Complexe.AlgorithmicaSeptember 2016HAL DOI
5 articleB.Blanche Buet, G. P.Gian Paolo Leonardi and S.Simon Masnou. A Varifold Approach to Surface Approximation.Archive for Rational Mechanics and Analysis2262November 2017, 639-694HAL DOI
6 articleF.Frédéric Chazal, D.David Cohen-Steiner and A.André Lieutier. A Sampling Theory for Compact Sets in Euclidean Space.Discrete Comput. Geom.4132009, 461--479URL: http://dx.doi.org/10.1007/s00454-009-9144-8
7 articleF.Frédéric Chazal, D.David Cohen-Steiner and Q.Quentin Mérigot. Geometric Inference for Measures based on Distance Functions.Foundations of Computational Mathematics116RR-69302011, 733-751HAL DOI
8 bookF.Frédéric Chazal, S. Y.Steve Y. Oudot, M.Marc Glisse and V.Vin De Silva. The Structure and Stability of Persistence Modules.SpringerBriefs in MathematicsSpringer Verlag2016, VII, 116HAL
9 articleL. J.Leonidas J. Guibas, S. Y.Steve Y. Oudot, P.Primoz Skraba and F.Frédéric Chazal. Persistence-Based Clustering in Riemannian Manifolds.Journal of the ACM606November 2013, 38HAL
10 articleM.Manish Mandad, D.David Cohen-Steiner, L.Leif Kobbelt, P.Pierre Alliez and M.Mathieu Desbrun. Variance-Minimizing Transport Plans for Inter-surface Mapping.ACM Transactions on Graphics362017, 14HAL DOI

12.2 Publications of the year

International journals

11 articleJ.-D.Jean-Daniel Boissonnat, R.Ramsay Dyer, A.Arijit Ghosh and M.Mathijs Wintraecken. Local Criteria for Triangulating General Manifolds.Discrete and Computational Geometry692023, 156-191HAL DOI back to text
12 articleJ.-D.Jean-Daniel Boissonnat, S.Siargey Kachanovich and M.Mathijs Wintraecken. Tracing Isomanifolds in R^d in Time Polynomial in d using Coxeter–Freudenthal–Kuhn Triangulations.SIAM Journal on Computing52April 2023, 452 - 486HAL DOI back to text
13 articleJ.-D.Jean-Daniel Boissonnat and M.Mathijs Wintraecken. The reach of subsets of manifolds.Journal of Applied and Computational Topology2023HAL DOI back to text
14 articleM. B.Magnus Bakke Botnan, V.Vadim Lebovici and S.Steve Oudot. Local characterizations for decomposability of 2-parameter persistence modules.Algebras and Representation Theory2023HAL DOI back to text
15 articleB.Benjamin Burton, H.-C.Hsien-Chih Chang, M.Maarten Löffler, A.Arnaud de Mesmay, C.Clément Maria, S.Saul Schleimer, E.Eric Sedgwick and J.Jonathan Spreer. Hard Diagrams of the Unknot.Experimental MathematicsFebruary 2023, 1-19HAL DOI back to text
16 articleJ.Jérémie Capitao-Miniconi and É.Élisabeth Gassiat. Deconvolution of spherical data corrupted with unknown noise..Electronic Journal of Statistics 171January 2023HAL DOI
17 articleG.Guillaume Carlier, A.Alex Delalande and Q.Quentin Merigot. Quantitative Stability of Barycenters in the Wasserstein Space.Probability Theory and Related FieldsOctober 2023HAL
18 articleO.Otfried Cheong, O.Olivier Devillers, J.-W.Ji-Won Park and M.Marc Glisse. Covering families of triangles.Periodica Mathematica Hungarica872023, 86--109HAL DOI
19 articleD.David Cohen-Steiner, A.André Lieutier and J.Julien Vuillamy. Delaunay and Regular Triangulations as Lexicographic Optimal Chains.Discrete and Computational Geometry70May 2023, 1–50HAL DOI
20 articleA.Alex Delalande and Q.Quentin Merigot. Quantitative Stability of Optimal Transport Maps under Variations of the Target Measure.Duke Mathematical Journal2023HAL
21 articleE.Etienne Lasalle. Heat diffusion distance processes: a statistically founded method to analyze graph data sets.Journal of Applied and Computational TopologyMay 2023HAL DOI back to text
22 articleJ.Jacob Leygonie, M.Mathieu Carrière, T.Théo Lacombe and S.Steve Oudot. A Gradient Sampling Algorithm for Stratified Maps with Applications to Topological Data Analysis.Mathematical Programming2022023, 199–239HAL DOI
23 articleC.Clément Maria and H.Hannah Schreiber. Discrete Morse Theory for Computing Zigzag Persistence.Discrete and Computational GeometryNovember 2023, 538-552HAL DOI back to text
24 articleD.Daniel Perez. On the persistent homology of almost surely $C^{0}$ stochastic processes.Journal of Applied and Computational TopologyJuly 2023HAL DOI back to text
25 articleM.Marie Perrot-Dockès, G.Gilles Blanchard, P.Pierre Neuvial and E.Etienne Roquain. Post hoc false discovery proportion inference under a Hidden Markov Model.TestSeptember 2023HAL DOI back to text

International peer-reviewed conferences

26 inproceedingsB.Bastien Dussap, G.Gilles Blanchard and B.-E.Badr-Eddine Chérief-Abdellatif. Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching.Machine Learning and Knowledge Discovery in Databases: Research TrackECML-PKDD 202314173Lecture Notes in Computer ScienceTurin (IT), ItalySpringer Nature SwitzerlandJune 2023, 69-85HAL DOI back to text back to text
27 inproceedingsA.André Lieutier and M.Mathijs Wintraecken. Hausdorff and Gromov-Hausdorff Stable Subsets of the Medial Axis.STOC 2023 - 55th Annual ACM Symposium on Theory of ComputingSTOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of ComputingOrlando (Florida), United StatesJune 2023, 1768–1776HAL DOI back to text
28 inproceedingsD.David Loiseaux, M.Mathieu Carrière and A. J.Andrew J. Blumberg. A Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions.NeurIPS 2023 - 36th Conference on Neural Information Processing SystemsAdvances in Neural Information Processing Systems 36New Orleans (LA), United StatesJune 2023HAL
29 inproceedingsD.David Loiseaux, L.Luis Scoccola, M.Mathieu Carrière, M. B.Magnus Bakke Botnan and S.Steve Oudot. Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures.NeurIPS 2023 - 36th Conference on Neural Information Processing SystemsAdvances in Neural Information Processing Systems 36New Orleans (LA), United StatesJune 2023HAL
30 inproceedingsE. M.El Mehdi Saad and G.Gilles Blanchard. Constant regret for sequence prediction with limited advice.Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLRAlgorithmic Learning Theory (ALT 2023)201Singapore, SingaporeFebruary 2023, 1343-1386HAL back to text
31 inproceedingsE. M.El Mehdi Saad, G.Gilles Blanchard and N.Nicolas Verzelen. Covariance Adaptive Best Arm Identification.Advances in Neural Information Processing SystemsNeurIPS 2023 - Neural Information Processing Systems36New Orleans, United States2023HAL
32 inproceedingsT.Tong Zhao, L.Laurent Busé, D.David Cohen-Steiner, T.Tamy Boubekeur, J.-M.Jean-Marc Thiery and P.Pierre Alliez. Variational Shape Reconstruction via Quadric Error Metrics.SIGGRAPH 2023 - The 50th International Conference & Exhibition On Computer Graphics & Interactive TechniquesLos Angeles, United StatesAugust 2023HAL DOI back to text

Scientific book chapters

33 inbookG.Gilles Blanchard and J.-B.Jean-Baptiste Fermanian. Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure.PROMS. 425Foundations of Modern Statistics : Festschrift in Honor of Vladimir Spokoiny, Berlin, Germany, November 6–8, 2019, Moscow, Russia, November 30, 2019Springer Proceedings in Mathematics & StatisticsSpringer International PublishingJuly 2023, 121-162HAL DOI

Doctoral dissertations and habilitation theses

34 thesisG.Georg Grützner. Möbius spaces and large-scale geometry.Université Paris-SaclayMay 2023HAL
35 thesisO.Olympio Hacquard. From topological features to machine learning models : a journey through persistence diagrams.Université Paris-SaclaySeptember 2023HAL
36 thesisV.Vadim Lebovici. Two complementary approaches in multi-parameter persistence : interval-decompositions and constructible functions.Université Paris-SaclaySeptember 2023HAL

Reports & preprints

37 miscD.Dominique Attali, H. D.Hana Dal Poz Kouřimská, C.Christopher Fillmore, I.Ishika Ghosh, A.André Lieutier, E.Elizabeth Stephenson and M.Mathijs Wintraecken. Tight bounds for the learning of homotopy à la Niyogi, Smale, and Weinberger for subsets of Euclidean spaces and of Riemannian manifolds.2022HAL DOI
38 miscG.Gilles Blanchard, A.Alexandra Carpentier and O.Oleksandr Zadorozhnyi. Moment inequalities for sums of weakly dependent random fields.July 2023HAL
39 miscJ.-D.Jean-Daniel Boissonnat, F.Florestan Brunck, H. D.Hana Dal Poz Kouřimská, A.Arijit Ghosh and M.Mathijs Wintraecken. Simplicial subdivision of simplices of arbitrary dimension in a space of constant non-zero curvature with bounded quality.November 2023HAL
40 miscJ.-D.Jean-Daniel Boissonnat and K.Kunal Dutta. Dimensionality Reduction for Persistent Homology with Gaussian Kernels.January 2023HAL
41 miscJ.-D.Jean-Daniel Boissonnat, S.Siargey Kachanovich and M.Mathijs Wintraecken. Triangulating submanifolds: An elementary and quantified version of Whitney's method.2023HAL DOI
42 miscB.Blanche Buet and X.Xavier Pennec. Flagfolds.May 2023HAL back to text
43 miscD.Denys Bulavka, O.Olivier Devillers, P.Philippe Duchon, M.Marc Glisse and X.Xavier Goaoc. Two Lower Bounds for Random Point Sets via Negative Association.2023HAL back to text
44 miscJ.Jeremie Capitao-Miniconi, É.Élisabeth Gassiat and L.Luc Lehéricy. Support and distribution inference from noisy data.April 2023HAL back to text
45 miscE. W.Erin Wolf Chambers, C.Christopher Fillmore, E.Elizabeth Stephenson and M.Mathijs Wintraecken. Burning or collapsing the medial axis is unstable.November 2023HAL
46 miscF.Frédéric Chazal, L.Laure Ferraris, P.Pablo Groisman, M.Matthieu Jonckheere, F.Frédéric Pascal and F.Facundo Sapienza. Choosing the parameter of the Fermat distance: navigating geometry and noise.2023HAL DOI back to text
47 miscC.Colleen Delaney, C.Clément Maria and E.Eric Samperton. An algorithm for Tambara-Yamagami quantum invariants of 3-manifolds, parameterized by the first Betti number.November 2023HAL back to text
48 miscU.Ulysse Gazin, G.Gilles Blanchard and E.Etienne Roquain. Transductive conformal inference with adaptive scores.October 2023HAL
49 miscM.Marc Glisse. Fast persistent homology computation for functions on ℝ.January 2023HAL back to text
50 miscG. A.Georg Alexander Gruetzner. Asymptotic-Möbius maps.February 2023HAL
51 miscG. A.Georg Alexander Gruetzner. Harmonic analysis on the boundary of hyperbolic groups.February 2023HAL back to text
52 miscO.Olympio Hacquard, G.Gilles Blanchard and C.Clément Levrard. Statistical learning on measures: an application to persistence diagrams.March 2023HAL
53 miscO.Olympio Hacquard and V.Vadim Lebovici. Euler Characteristic Tools For Topological Data Analysis.March 2023HAL
54 miscF.Felix Hensel, C.Charles Arnal, M.Mathieu Carrière, T.Théo Lacombe, H.Hiroaki Kurihara, Y.Yuichi Ike and F.Frédéric Chazal. MAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks.May 2023HAL back to text
55 miscH. D.Hana Dal Poz Kouřimská, A.André Lieutier and M.Mathijs Wintraecken. The medial axis of closed bounded sets is Lipschitz stable with respect to the Hausdorff distance under ambient diffeomorphisms.November 2023HAL
56 miscA.André Lieutier and M.Mathijs Wintraecken. Hausdorff and Gromov-Hausdorff stable subsets of the medial axis.April 2023HAL DOI
57 miscD.David Loiseaux, M.Mathieu Carriere and A.Andrew Blumberg. Fast, Stable and Efficient Approximation of Multi-parameter Persistence Modules with MMA.June 2023HAL
58 miscW.Wojciech Reise, B.Bertrand Michel and F.Frédéric Chazal. Topological signatures of periodic-like signals.June 2023HAL back to text

Other scientific publications

59 miscM.Mathijs Wintraecken. Translation of "Simplizialzerlegungen von Beschrankter Flachheit'' by Hans Freudenthal, Annals of Mathematics, Second Series, Volume 43, Number 3, July 1942, Pages 580-583.2023HAL DOI

12.3 Other

Scientific popularization

60 miscD.Davide Faranda, T.Théo Lacombe, N.Nina Otter and K.Kristian Strommen. Climate Science at the Interface Between Topological Data Analysis and Dynamical Systems Theory.February 2024, 267-271HAL DOI

DATASHAPE - 2023

DATASHAPE - 2023

2023Activity reportProject-TeamDATASHAPE

Keywords

Computer Science and Digital Science

Other Research Topics and Application Domains

1 Team members, visitors, external collaborators

Research Scientists

Faculty Members

Post-Doctoral Fellows

PhD Students

Technical Staff

Interns and Apprentices

Administrative Assistants

Visiting Scientist

External Collaborators

2 Overall objectives

3 Research program

3.1 Algorithmic aspects and new mathematical directions for topological and geometric data analysis

3.2 Statistical aspects of topological and geometric data analysis

3.3 Topological and geometric approaches for machine learning

3.4 Experimental research and software development

4 Application domains

5 Social and environmental responsibility

5.1 Footprint of research activities

6 Highlights of the year

6.1 Awards

6.2 Events

6.3 PhD defenses

7 New software, platforms, open data

7.1 New software

7.1.1 GUDHI

7.2 Open data

8 New results

8.1 Algorithmic aspects and new mathematical directions for topological and geometric data analysis

8.1.1 Fast persistent homology computation for functions on ℝ

8.1.2 Hausdorff and Gromov-Hausdorff Stable Subsets of the Medial Axis

8.1.3 Tracing Isomanifolds in ℝd in Time Polynomial in d using Coxeter–Freudenthal–Kuhn Triangulations

8.1.4 The reach of subsets of manifolds

8.1.5 Local Criteria for Triangulating General Manifolds

8.1.6 Local characterizations for decomposability of 2-parameter persistence modules

8.1.7 Discrete Morse Theory for Computing Zigzag Persistence

8.2 Statistical aspects of topological and geometric data analysis

8.2.1 On the persistent homology of almost surely C0 stochastic processes

8.2.2 Topological signatures of periodic-like signals

8.2.3 Heat diffusion distance processes: a statistically founded method to analyze graph data sets

8.2.4 Support and distribution inference from noisy data

8.2.5 A gradient sampling algorithm for stratified maps with applications to topological data analysis

8.3 Topological and geometric approaches for machine learning

8.3.1 Choosing the parameter of the Fermat distance: navigating geometry and noise.

8.3.2 MAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks

8.3.3 Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching

8.3.4 Post hoc false discovery proportion inference under a Hidden Markov Model

8.3.5 Stable vectorization of multiparameter persistent homology using signed barcodes as measures

8.3.6 A framework for fast and stable representations of multiparameter persistent homology decompositions

8.4 Algorithmic and Combinatorial Aspects of Low Dimensional Topology

8.4.1 An algorithm for Tambara-Yamagami quantum invariants of 3-manifolds, parameterized by the first Betti number

8.4.2 Hard Diagrams of the Unknot

8.5 Miscellaneous

8.5.1 Variational Shape Reconstruction via Quadric Error Metrics

8.5.2 Two Lower Bounds for Random Point Sets via Negative Association

8.5.3 Harmonic analysis on the boundary of hyperbolic groups

8.5.4 Constant regret for sequence prediction with limited advice

8.5.5 Flagfolds

9 Bilateral contracts and grants with industry

9.1 Bilateral contracts with industry

10 Partnerships and cooperations

10.1 International research visitors

10.1.1 Visits of international scientists

Other international visits to the team

10.1.2 Visits to international teams

Sabbatical programme

Research stays abroad

10.2 National initiatives

10.2.1 ANR

ANR Chair in AI

ANR ALGOKNOT

ANR GeMfaceT

10.2.2 Collaboration with other national research institutes

IFPEN

8.1.3 Tracing Isomanifolds in $ℝ^{d}$ in Time Polynomial in $d$ using Coxeter–Freudenthal–Kuhn Triangulations

8.2.1 On the persistent homology of almost surely $C^{0}$ stochastic processes