2023Activity reportProjectTeamDATASHAPE
RNSR: 201622050C Research centers Inria Saclay Centre at Université ParisSaclay Inria Centre at Université Côte d'Azur
 In partnership with:Université ParisSaclay, CNRS
 Team name: Understanding the shape of data
 In collaboration with:Laboratoire de mathématiques d'Orsay de l'Université de ParisSud (LMO)
 Domain:Algorithmics, Programming, Software and Architecture
 Theme:Algorithmics, Computer Algebra and Cryptology
Keywords
Computer Science and Digital Science
 A3. Data and knowledge
 A3.4. Machine learning and statistics
 A7.1. Algorithms
 A8. Mathematics of computing
 A8.1. Discrete mathematics, combinatorics
 A8.3. Geometry, Topology
 A9. Artificial intelligence
Other Research Topics and Application Domains
 B1. Life sciences
 B2. Health
 B5. Industry of the future
 B9. Society and Knowledge
 B9.5. Sciences
1 Team members, visitors, external collaborators
Research Scientists
 Frederic Chazal [Team leader, INRIA, Senior Researcher, HDR]
 Charles Arnal [INRIA, Starting Research Position, from Nov 2023]
 JeanDaniel Boissonnat [INRIA, Emeritus, HDR]
 Mathieu Carrière [INRIA, Researcher]
 David CohenSteiner [INRIA, Researcher]
 Marc Glisse [INRIA, Researcher]
 Jisu Kim [INRIA, Starting Research Position, until Mar 2023]
 Clément Maria [INRIA, Researcher]
 Nina Otter [INRIA, ISFP, from Oct 2023]
 Mathijs Wintraecken [INRIA, ISFP, from Feb 2023]
Faculty Members
 Gilles Blanchard [UNIV PARIS SACLAY, Associate Professor, HDR]
 Blanche Buet [UNIV PARIS SACLAY, Associate Professor]
 Pierre Pansu [UNIV PARIS SACLAY, Associate Professor, HDR]
PostDoctoral Fellows
 Charles Arnal [INRIA, PostDoctoral Fellow, until Oct 2023]
 Solenne Gaucher [INRIA, PostDoctoral Fellow, until Aug 2023]
 Felix Hensel [INRIA, PostDoctoral Fellow, until May 2023]
PhD Students
 Charly Boricaud [UNIV PARIS SACLAY]
 Jeremie CapitaoMiniconi [UNIV PARIS SACLAY]
 Antoine Commaret [UNIV COTE D'AZUR]
 Bastien Dussap [UNIV PARIS SACLAY]
 Henrique Ennes [UNIV COTE D'AZUR, from Oct 2023]
 Laure Ferraris [INRIA]
 Georg, Alexander Gruetzner [STUDIENSTIFTUNG, until Mar 2023]
 Alexandre, Guerin [Sysnav]
 Olympio Hacquard [UNIV PARIS SACLAY, until Aug 2023]
 Hugo Henneuse [UNIV PARIS SACLAY]
 Vadim Lebovici [ENS PARIS, until Aug 2023]
 David Loiseaux [INRIA]
 Wojciech Reise [INRIA, until Nov 2023]
 Christophe Vuong [TELECOM PARIS]
Technical Staff
 Vincent Rouvreau [INRIA, Engineer]
 Hannah Schreiber [INRIA, Engineer]
Interns and Apprentices
 Hurtado Quiceno Andrea [INRIA, Intern, from May 2023 until Aug 2023]
 Raphael De Maleprade [ENS PARISSACLAY, Intern, from Apr 2023 until Aug 2023]
 Simon Delalande [INRIA, Intern, until Feb 2023]
 Mohamed Hedi Derbel [INRIA, Intern, from Apr 2023 until Aug 2023]
 Andrea Vanessa Hurtado Quiceno [INRIA, Intern, from May 2023 until Aug 2023]
Administrative Assistants
 AissatouSadio Diallo [INRIA]
 Sophie Honnorat [INRIA]
Visiting Scientist
 John Harvey [UNIV CARDIFF, until Jan 2023]
External Collaborators
 Jisu Kim [LMO, from Apr 2023 until Aug 2023]
 Bertrand Michel [CENTRALE NANTES]
2 Overall objectives
DataShape is a research project in Topological Data Analysis (TDA), a recent field whose aim is to uncover, understand and exploit the topological and geometric structure underlying complex and possibly high dimensional data. The overall objective of the DataShape project is to settle the mathematical, statistical and algorithmic foundations of TDA and to disseminate and promote our results in the data science community.
The approach of DataShape relies on the conviction that it is necessary to combine statistical, topological/geometric and computational approaches in a common framework, in order to face the challenges of TDA. Another conviction of DataShape is that TDA needs to be combined with other data science approaches and tools to lead to successful real applications. It is necessary for TDA challenges to be simultaneously addressed from the fundamental and applied sides.
The team members have actively contributed to the emergence of TDA during the last few years. The variety of expertise, going from fundamental mathematics to software development, and the strong interactions within our team as well as numerous well established international collaborations make our group one of the best to achieve these goals.
The expected output of DataShape is twofold. First, we intend to set up and develop the mathematical, statistical and algorithmic foundations of Topological and Geometric Data Analysis. Second, we intend to pursue the development of the GUDHI platform, initiated by the team members and which is becoming a standard tool in TDA, in order to provide an efficient stateoftheart toolbox for the understanding of the topology and geometry of data. The ultimate goal of DataShape is to develop and promote TDA as a new family of wellfounded methods to uncover and exploit the geometry of data. This also includes the clarification of the position and complementarity of TDA with respect to other approaches and tools in data science. Our objective is also to provide practically efficient and flexible tools that could be used independently, complementarily or in combination with other classical data analysis and machine learning approaches.
3 Research program
3.1 Algorithmic aspects and new mathematical directions for topological and geometric data analysis
tda requires to construct and manipulate appropriate representations of complex and high dimensional shapes. A major difficulty comes from the fact that the complexity of data structures and algorithms used to approximate shapes rapidly grows as the dimensionality increases, which makes them intractable in high dimensions. We focus our research on simplicial complexes which offer a convenient representation of general shapes and generalize graphs and triangulations. Our work includes the study of simplicial complexes with good approximation properties and the design of compact data structures to represent them.
In low dimensions, effective shape reconstruction techniques exist that can provide precise geometric approximations very efficiently and under reasonable sampling conditions. Extending those techniques to higher dimensions as is required in the context of tda is problematic since almost all methods in low dimensions rely on the computation of a subdivision of the ambient space. A direct extension of those methods would immediately lead to algorithms whose complexities depend exponentially on the ambient dimension, which is prohibitive in most applications. A first direction to bypass the curse of dimensionality is to develop algorithms whose complexities depend on the intrinsic dimension of the data (which most of the time is small although unknown) rather than on the dimension of the ambient space. Another direction is to resort to cruder approximations that only captures the homotopy type or the homology of the sampled shape. The recent theory of persistent homology provides a powerful and robust tool to study the homology of sampled spaces in a stable way.
3.2 Statistical aspects of topological and geometric data analysis
The wide variety of larger and larger available data  often corrupted by noise and outliers  requires to consider the statistical properties of their topological and geometric features and to propose new relevant statistical models for their study.
There exist various statistical and machine learning methods intending to uncover the geometric structure of data. Beyond manifold learning and dimensionality reduction approaches that generally do not allow to assert the relevance of the inferred topological and geometric features and are not wellsuited for the analysis of complex topological structures, set estimation methods intend to estimate, from random samples, a set around which the data is concentrated. In these methods, that include support and manifold estimation, principal curves/manifolds and their various generalizations to name a few, the estimation problems are usually considered under losses, such as Hausdorff distance or symmetric difference, that are not sensitive to the topology of the estimated sets, preventing these tools to directly infer topological or geometric information.
Regarding purely topological features, the statistical estimation of homology or homotopy type of compact subsets of Euclidean spaces, has only been considered recently, most of the time under the quite restrictive assumption that the data are randomly sampled from smooth manifolds.
In a more general setting, with the emergence of new geometric inference tools based on the study of distance functions and algebraic topology tools such as persistent homology, computational topology has recently seen an important development offering a new set of methods to infer relevant topological and geometric features of data sampled in general metric spaces. The use of these tools remains widely heuristic and until recently there were only a few preliminary results establishing connections between geometric inference, persistent homology and statistics. However, this direction has attracted a lot of attention over the last three years. In particular, stability properties and new representations of persistent homology information have led to very promising results to which the DataShape members have significantly contributed. These preliminary results open many perspectives and research directions that need to be explored.
Our goal is to build on our first statistical results in tda to develop the mathematical foundations of Statistical Topological and Geometric Data Analysis. Combined with the other objectives, our ultimate goal is to provide a wellfounded and effective statistical toolbox for the understanding of topology and geometry of data.
3.3 Topological and geometric approaches for machine learning
This objective is driven by the problems raised by the use of topological and geometric approaches in machine learning. The goal is both to use our techniques to better understand the role of topological and geometric structures in machine learning problems and to apply our tda tools to develop specialized topological approaches to be used in combination with other machine learning methods.
3.4 Experimental research and software development
We develop a high quality open source software platform called gudhi which is becoming a reference in geometric and topological data analysis in high dimensions. The goal is not to provide code tailored to the numerous potential applications but rather to provide the central data structures and algorithms that underlie applications in geometric and topological data analysis.
The development of the gudhi platform also serves to benchmark and optimize new algorithmic solutions resulting from our theoretical work. Such development necessitates a whole line of research on software architecture and interface design, heuristics and finetuning optimization, robustness and arithmetic issues, and visualization. We aim at providing a full programming environment following the same recipes that made up the success story of the cgal library, the reference library in computational geometry.
Some of the algorithms implemented on the platform will also be interfaced to other software platforms, such as the R software for statistical computing, and languages such as Python in order to make them usable in combination with other data analysis and machine learning tools. A first attempt in this direction has been done with the creation of an R package called TDA in collaboration with the group of Larry Wasserman at Carnegie Mellon University (Inria Associated team CATS) that already includes some functionalities of the gudhi library and implements some joint results between our team and the CMU team. A similar interface with the Python language is also considered a priority. To go even further towards helping users, we will provide utilities that perform the most common tasks without requiring any programming at all.
4 Application domains
Our work is mostly of a fundamental mathematical and algorithmic nature but finds a variety of applications in data analysis, e.g., in material science, biology, sensor networks, 3D shape analysis and processing, to name a few.
More specifically, DataShape is working on the analysis of trajectories obtained from inertial sensors (PhD theses of Wojtek Riese and Alexandre Guérin with Sysnav, participation to the DGA/ANR challenge MALIN with Sysnav) and, more generally on the development of new TDA methods for Machine Learning and Artificial Intelligence for (multivariate) timedependent data from various kinds of sensors in collaboration with Fujitsu, or high dimensional point cloud data with Metafora.
DataShape is also working in collaboration with the University of Columbia in NewYork, especially with the Rabadan lab, in order to improve bioinformatics methods and analyses for single cell genomic data. For instance, there is a lot of work whose aim is to use TDA tools such as persistent homology and the Mapper algorithm to characterize, quantify and study statistical significance of biological phenomena that occur in large scale single cell data sets. Such biological phenomena include, among others: the cell cycle, functional differentiation of stem cells, and immune system responses (such as the spatial response on the tissue location, and the genomic response with protein expression) to breast cancer.
5 Social and environmental responsibility
5.1 Footprint of research activities
The weekly research seminar of DataShape is now taking place in hybrid mode. The travels for the team members have decreased a lot these years to take care of the environmental footprint of the team.
6 Highlights of the year
6.1 Awards
 Bastien Dussap obtained the best student paper award at ECMLPKDD as first author of 26.
6.2 Events
 We organized a one week team workshop in May 2023, giving the opportunity to all the PhD students, postdoc and researchers of the team to present their work and discuss scientific questions all together. Some researchers, Simon Masnou (Université Lyon 1) and Rémy Leclercq (Université ParisSaclay) were also invited to give minicourses.
6.3 PhD defenses
 Georg Gruetzner. Möbius spaces and large scale geometry. May 2023.
 Olympio Hacquard. From topological features to machine learning models : a journey through persistence diagrams. September 2023.
 Vadim Lebovici. Two complementary approaches in multiparameter persistence : intervaldecompositions and constructible functions. September 2023.
 Wojciech Riese. Topological techniques for inference on periodic functions with phase variation. December 2023.
 Christophe Vuong. Contributions à l’analyse stochastique pour structures sans propriété de diffusion. Décembre 2023.
7 New software, platforms, open data
7.1 New software
7.1.1 GUDHI

Name:
Geometric Understanding in Higher Dimensions

Keywords:
Computational geometry, Topology, Clustering

Scientific Description:
The Gudhi library is an open source library for Computational Topology and Topological Data Analysis (TDA). It offers stateoftheart algorithms to construct various types of simplicial complexes, data structures to represent them, and algorithms to compute geometric approximations of shapes and persistent homology.
The GUDHI library offers the following interoperable modules:
. Complexes: + Cubical + Simplicial: Rips, Witness, Alpha and Čech complexes + Cover: Nerve and Graph induced complexes . Data structures and basic operations: + Simplex tree, Skeleton blockers and Toplex map + Construction, update, filtration and simplification . Topological descriptors computation . Manifold reconstruction . Topological descriptors tools: + Bottleneck and Wasserstein distance + Statistical tools + Persistence diagram and barcode

Functional Description:
The GUDHI open source library will provide the central data structures and algorithms that underly applications in geometry understanding in higher dimensions. It is intended to both help the development of new algorithmic solutions inside and outside the project, and to facilitate the transfer of results in applied fields.

News of the Year:
Below is a list of changes made since GUDHI 3.7.0 (december 2022):
Perslay a TensorFlow layer for persistence diagrams representations.
Cover Complex New classes to compute Mapper, Graph Induced complex and Nerves with a scikitlearn like interface.
Persistent cohomology New lineartime compute_persistence_of_function_on_line, also available though CubicalPersistence in Python.
Cubical complex Add possibility to build a lowerstar filtration from vertices instead of topdimensional cubes. Much faster implementation for the 2d case with input from topdimensional cells.
Hera version of Wasserstein distance now provides matching in its interface.
Subsampling New choose_n_farthest_points_metric as a faster alternative of choose_n_farthest_points.
SimplexTree SimplexTree can now be used with python pickle. A helper for_each_simplex that applies a given function object on each simplex A new option link_nodes_by_label to speed up cofaces and stars access, when set to true. A new option stable_simplex_handles to keep Simplex handles valid even after insertions or removals, when set to true.
Čech complex A function assign_MEB_filtration that assigns to each simplex a filtration value equal to the squared radius of its minimal enclosing ball (MEB), given a simplicial complex and an embedding of its vertices. Applied on a Delaunay triangulation, it computes the DelaunayČech filtration.
Edge collapse A Python function reduce_graph to simplify a clique filtration (represented as a sparse weighted graph), while preserving its persistent homology.
 URL:
 Publication:

Contact:
Marc Glisse

Participants:
Clément Maria, François Godi, David Salinas, JeanDaniel Boissonnat, Marc Glisse, Mariette Yvinec, Pawel Dlotko, Siargey Kachanovich, Vincent Rouvreau, Mathieu Carrière, Clément Jamin, Siddharth Pritam, Frederic Chazal, Steve Oudot, Wojciech Reise, Hind Montassif, Hannah Schreiber, Martin Royer, David Loiseaux

Partners:
Université Côte d'Azur (UCA), Fujitsu
7.2 Open data
 The TOPAL database of topological quantum invariants of knots and 3manifolds. Contact: Clément MARIA
8 New results
8.1 Algorithmic aspects and new mathematical directions for topological and geometric data analysis
8.1.1 Fast persistent homology computation for functions on ℝ
Participant: Marc Glisse.
0dimensional persistent homology is known, from a computational point of view, as the easy case. Indeed, given a list of $n$ edges in nondecreasing order of filtration value, one only needs a unionfind data structure to keep track of the connected components and we get the persistence diagram in time $O\left(n\alpha \right(n\left)\right)$. The running time is thus usually dominated by sorting the edges in $\Theta (nlog(n\left)\right)$. A littleknown fact is that, in the particularly simple case of studying the sublevel sets of a piecewiselinear function on $\mathbb{R}$ or ${\mathbb{S}}^{1}$, persistence can actually be computed in linear time. This note 49 presents a simple algorithm that achieves this complexity and an extension to image persistence. An implementation is available in Gudhi.
8.1.2 Hausdorff and GromovHausdorff Stable Subsets of the Medial Axis
Participant: Mathijs Wintraecken.
In collaboration with André Lieutier.
In 27 we introduce a pruning of the medial axis called the $(\lambda ,\alpha )$medial axis (${\mathrm{ax}}_{\lambda}^{\alpha}$). We prove that the $(\lambda ,\alpha )$medial axis of a set $K$ is stable in a GromovHausdorff sense under weak assumptions. More formally we prove that if $K$ and ${K}^{\text{'}}$ are close in the Hausdorff (${d}_{H}$) sense then the $(\lambda ,\alpha )$medial axes of $K$ and ${K}^{\text{'}}$ are close as metric spaces, that is the GromovHausdorff distance (${d}_{GH}$) between the two is $\frac{1}{4}$Hölder in the sense that ${d}_{GH}({\mathrm{ax}}_{\lambda}^{\alpha}\left(K\right),{\mathrm{ax}}_{\lambda}^{\alpha}\left({K}^{\text{'}}\right))\lesssim {d}_{H}{(K,{K}^{\text{'}})}^{1/4}$. The Hausdorff distance between the two medial axes is also bounded, by ${d}_{H}({\mathrm{ax}}_{\lambda}^{\alpha}\left(K\right),{\mathrm{ax}}_{\lambda}^{\alpha}\left({K}^{\text{'}}\right))\lesssim {d}_{H}{(K,{K}^{\text{'}})}^{1/2}$. These quantified stability results provide guarantees for practical computations of medial axes from approximations. Moreover, they provide key ingredients for studying the computability of the medial axis in the context of computable analysis.
8.1.3 Tracing Isomanifolds in ${\mathbb{R}}^{d}$ in Time Polynomial in $d$ using Coxeter–Freudenthal–Kuhn Triangulations
Participant: JeanDaniel Boissonnat, Siargey Kachanovich, Mathijs Wintraecken.
Isomanifolds are the generalization of isosurfaces to arbitrary dimension and codimension, i.e. submanifolds of ${\mathbb{R}}^{d}$ defined as the zero set of a multivariate multivalued smooth function $f:{\mathbb{R}}^{d}\to {\mathbb{R}}^{dn}$, where $n$ is the intrinsic dimension of the manifold and we assume that 0 is a regular value. A natural way to approximate a smooth isomanifold $\mathcal{M}={f}^{1}\left(0\right)$ is to consider its PiecewiseLinear (PL) approximation $\widehat{\mathcal{M}}$ based on a triangulation $\mathcal{T}$ of the ambient space ${\mathbb{R}}^{d}$, whose longest edge has length $D$. In 12, we describe a simple algorithm to trace isomanifolds from a given starting point on each connected component. The algorithm works for arbitrary dimensions $n$ and $d$, and $D$. Our main result is that, when $f$ (or $\mathcal{M}$) has bounded complexity, the complexity of the algorithm is polynomial in $d$ and $\delta =1/D$ (and unavoidably exponential in $n$). Since it is known that for $\delta =\Omega \left({d}^{2.5}\right)$, $\widehat{\mathcal{M}}$ is $O\left({D}^{2}\right)$close and isotopic to $\mathcal{M}$, our algorithm produces a faithful PLapproximation of isomanifolds of bounded complexity in time polynomial in $d$. Combining this algorithm with dimensionality reduction techniques, the dependency on $d$ in the size of $\widehat{\mathcal{M}}$ can be completely removed with high probability. We also show that the algorithm can handle isomanifolds with boundary and, more generally, isostratifolds. The algorithm for isomanifolds with boundary has been implemented and experimental results are reported, showing that it is practical and can handle cases that are far ahead of the stateoftheart.
8.1.4 The reach of subsets of manifolds
Participant: JeanDaniel Boissonnat, Mathijs Wintraecken.
Kleinjohann (Archiv der Mathematik 35(1):574–582, 1980; Mathematische Zeitschrift 176(3), 327–344, 1981) and Bangert (Archiv der Mathematik 38(1):54–57, 1982) extended the reach $\mathrm{rch}\left(\mathcal{S}\right)$ from subsets $\mathcal{S}$ of Euclidean space to the reach ${\mathrm{rch}}_{\mathcal{M}}\left(\mathcal{S}\right)$ of subsets $\mathcal{S}$ of Riemannian manifolds $\mathcal{M}$, where is smooth (we'll assume at least ${C}^{3}$). Bangert showed that sets of positive reach in Euclidean space and Riemannian manifolds are very similar. In 13 we introduce a slight variant of Kleinjohann's and Bangert's extension and quantify the similarity between sets of positive reach in Euclidean space and Riemannian manifolds in a new way: Given and $p\in \mathcal{M}$, we bound the local feature size (a local version of the reach) of its lifting to the tangent space via the inverse exponential map (${\mathrm{exp}}_{p}^{1}(\mathcal{S}$) at $q$, assuming that ${\mathrm{rch}}_{\mathcal{M}}\left(\mathcal{S}\right)$ and the geodesic distance ${d}_{\mathcal{M}}(p,q)$ are bounded. These bounds are motivated by the importance of the reach and local feature size to manifold learning, topological inference, and triangulating manifolds and the fact that intrinsic approaches circumvent the curse of dimensionality.
8.1.5 Local Criteria for Triangulating General Manifolds
Participant: JeanDaniel Boissonnat, Mathijs Wintraecken.
In collaboration with Arijit Ghosh and Ramsay Dyer.
In 11, we present criteria for establishing a triangulation of a manifold. Given a manifold $M$, a simplicial complex $A$, and a map $H$ from the underlying space of $A$ to $M$, our criteria are presented in local coordinate charts for $M$, and ensure that $H$ is a homeomorphism. These criteria do not require a differentiable structure, or even an explicit metric on $M$. No Delaunay property of $A$ is assumed. The result provides a triangulation guarantee for algorithms that construct a simplicial complex by working in local coordinate patches. Because the criteria are easily verified in such a setting, they are expected to be of general use.
8.1.6 Local characterizations for decomposability of 2parameter persistence modules
Participant: Vadim Lebovici.
In collaboration with Steve Oudot (Inria) and M. B. Botnan (Vrije Universiteit)
In 14, we investigate the existence of sufficient local conditions under which poset representations decompose as direct sums of indecomposables from a given class. In our work, the indexing poset is the product of two totally ordered sets, corresponding to the setting of 2parameter persistence in topological data analysis. Our indecomposables of interest belong to the socalled interval modules, which by definition are indicator representations of intervals in the poset. While the whole class of interval modules does not admit such a local characterization, we show that the subclass of rectangle modules does admit one and that it is, in some precise sense, the largest subclass to do so.
8.1.7 Discrete Morse Theory for Computing Zigzag Persistence
Participant: Clément Maria, Hannah Schreiber.
We introduce 23 a theoretical and computational framework to use discrete Morse theory as an efficient preprocessing in order to compute zigzag persistent homology. From a zigzag filtration of complexes ${K}_{i}$, we introduce a zigzag Morse filtration whose complexes ${A}_{i}$ are Morse reductions of the original complexes ${K}_{i}$, and we prove that they both have same persistent homology. The maps in the zigzag Morse filtration are forward and backward inclusions, as is standard in zigzag persistence, as well as a new type of map inducing non trivial changes in the boundary operator of the Morse complex. We study in details this last map, and design algorithms to compute the update both at the complex level and at the homology matrix level when computing zigzag persistence. We deduce an algorithm to compute the zigzag persistence of a filtration that depends mostly on the number of critical cells of the complexes, and show experimentally that it performs better in practice.
8.2 Statistical aspects of topological and geometric data analysis
8.2.1 On the persistent homology of almost surely ${C}^{0}$ stochastic processes
Participant: Daniel Perez.
This paper 24 investigates the properties of the persistence diagrams stemming from almost surely continuous random processes on $[0,t]$. We focus our study on two variables which together characterize the barcode : the number of points of the persistence diagram inside a rectangle $]\infty ,x]\times [x+\epsilon ,\infty [$, ${N}^{x,x+\epsilon}$ and the number of bars of length $\ge \epsilon $, ${N}^{\epsilon}$. For processes with the strong Markov property, we show both of these variables admit a moment generating function and in particular moments of every order. Switching our attention to semimartingales, we show the asymptotic behaviour of ${N}^{\epsilon}$ and ${N}^{x,x+\epsilon}$ as $\epsilon \to 0$ and of ${N}^{\epsilon}$ as $\epsilon \to \infty $. Finally, we study the repercussions of the classical stability theorem of barcodes and illustrate our results with some examples, most notably Brownian motion and empirical functions converging to the Brownian bridge.
8.2.2 Topological signatures of periodiclike signals
Participant: Wojciech Riese, Frédéric Chazal.
In collaboration with Bertrand Michel (Ecole Centrale Nantes)
In 58, we present a method to construct signatures of periodiclike data. Based on topological considerations, our construction encodes information about the order and values of local extrema. Its main strength is robustness to reparametrisation of the observed signal, so that it depends only on the form of the periodic function. The signature converges as the observation contains increasingly many periods. We show that it can be estimated from the observation of a single time series using bootstrap techniques.8.2.3 Heat diffusion distance processes: a statistically founded method to analyze graph data sets
Participant: Etienne Lasalle.
In 21, we propose two multiscale comparisons of graphs using heat diffusion, allowing to compare graphs without node correspondence or even with different sizes. These multiscale comparisons lead to the definition of Lipschitzcontinuous empirical processes indexed by a real parameter. The statistical properties of empirical means of such processes are studied in the general case. Under mild assumptions, we prove a functional Central Limit Theorem, as well as a Gaussian approximation with a rate depending only on the sample size. Once applied to our processes, these results allow to analyze data sets of pairs of graphs. We design consistent confidence bands around empirical means and consistent twosample tests, using bootstrap methods. Their performances are evaluated by simulations on synthetic data sets.
8.2.4 Support and distribution inference from noisy data
Participant: Jérémie CapitaoMiniconi.
In collaboration E. Gassiat (LMO, Univ. ParisSaclay) and L Lehéricy (Univ. Côte d'Azur)
In 44, we consider noisy observations of a distribution with unknown support. In the deconvolution model, it has been proved recently that, under very mild assumptions, it is possible to solve the deconvolution problem without knowing the noise distribution and with no sample of the noise. We first give general settings where the theory applies and provide classes of supports that can be recovered in this context. We then exhibit classes of distributions over which we prove adaptive minimax rates (up to a log log factor) for the estimation of the support in Hausdorff distance. Moreover, for the class of distributions with compact support, we provide estimators of the unknown (in general singular) distribution and prove maximum rates in Wasserstein distance. We also prove an almost matching lower bound on the associated minimax risk.8.2.5 A gradient sampling algorithm for stratified maps with applications to topological data analysis
Participant: Mathieu Carrière.
In collaboration J. Leygonie (Arda), T. Lacombe (Univ. Gustave Eiffel) and S. Oudot (Geomerix, Inria Saclay)
We introduce a novel gradient descent algorithm extending the wellknown Gradient Sampling methodology to the class of stratifiably smooth objective functions, which are defined as locally Lipschitz functions that are smooth on some regular piecescalled the strataof the ambient Euclidean space. For this class of functions, our algorithm achieves a sublinear convergence rate. We then apply our method to objective functions based on the (extended) persistent homology map computed over lowerstar filters, which is a central tool of Topological Data Analysis. For this, we propose an efficient exploration of the corresponding stratification by using the Cayley graph of the permutation group. Finally, we provide benchmark and novel topological optimization problems, in order to demonstrate the utility and applicability of our framework.
8.3 Topological and geometric approaches for machine learning
8.3.1 Choosing the parameter of the Fermat distance: navigating geometry and noise.
Participant: Frédéric Chazal, Laure Ferraris.
In collaboration with P. Groisman, M. Jonckheere, F. Pascal and F. Sapienza
The Fermat distance has been recently established as a useful tool for machine learning tasks when a natural distance is not directly available to the practitioner or to improve the results given by Euclidean distances by exploding the geometrical and statistical properties of the dataset. This distance depends on a parameter $\alpha $ that greatly impacts the performance of subsequent tasks. Ideally, the value of $\alpha $ should be large enough to navigate the geometric intricacies inherent to the problem. At the same, it should remain restrained enough to sidestep any deleterious ramifications stemming from noise during the process of distance estimation. In 46, we study both theoretically and through simulations how to select this parameter.
8.3.2 MAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks
Participant: Felix Hensel, Charles Arnal, Mathieu Carrière, Frédéric Chazal.
In collaboration with T. Lacombe (Univ. G. Eiffel), H. Kurihara (Fujitsu), Y. Ike (Kyushu Univ.)
Despite their successful application to a variety of tasks, neural networks remain limited, like other machine learning methods, by their sensitivity to shifts in the data: their performance can be severely impacted by differences in distribution between the data on which they were trained and that on which they are deployed. In 54, we propose a new family of representations, called MAGDiff, that we extract from any given neural network classifier and that allows for efficient covariate data shift detection without the need to train a new model dedicated to this task. These representations are computed by comparing the activation graphs of the neural network for samples belonging to the training distribution and to the target distribution, and yield powerful data and taskadapted statistics for the twosample tests commonly used for data set shift detection. We demonstrate this empirically by measuring the statistical powers of twosample KolmogorovSmirnov (KS) tests on several different data sets and shift types, and showing that our novel representations induce significant improvements over a stateoftheart baseline relying on the network output.
8.3.3 Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching
Participant: Bastien Dussap, Gilles Blanchard.
In collaboration with BadrEdine ChériefAbdellatif (CNRS, LPSM, U. ParisSorbonne)
Quantification learning deals with the task of estimating the target label distribution under label shift. In 26, we established a unifying framework, distribution feature matching (DFM), that recovers as particular instances various estimators introduced in previous literature. We derived a general performance bound for DFM procedures, improving in several key aspects upon previous bounds derived in particular cases. We then extended this analysis to study robustness of DFM procedures in the misspecified setting under departure from the exact label shift hypothesis, in particular in the case of contamination of the target by an unknown distribution. These theoretical findings were confirmed by a detailed numerical study on simulated and realworld datasets. We also introduced an efficient, scalable and robust version of kernelbased DFM using the Random Fourier Feature principle.
This paper received the "Best student paper" award at the ECML/PKDD conference 2023.
8.3.4 Post hoc false discovery proportion inference under a Hidden Markov Model
Participant: Gilles Blanchard.
In collaboration with Marie PerrotDockès (MAP5, U. Paris), Étienne Roquain (LPSM, U. ParisSorbonne), Pierre Neuvial (CNRS, U. Toulouse)
We addressed in 25 the multiple testing problem under the assumption that the true/false hypotheses are driven by a Hidden Markov Model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of Sun and Cai (2009). While previous work has concentrated on deriving specific procedures with a controlled False Discovery Rate (FDR) under this model, following a recent trend in selective inference, we considered the problem of establishing confidence bounds on the false discovery proportion (FDP), for a userselected set of hypotheses that can depend on the observed data in an arbitrary way. We developed a methodology to construct such confidence bounds first when the HMM model is known, then when its parameters are unknown and estimated, including the data distribution under the null and the alternative, using a nonparametric approach. In the latter case, we proposed a bootstrapbased methodology to take into account the effect of parameter estimation error. We showed that taking advantage of the assumed HMM structure allows for a substantial improvement of confidence bound sharpness over existing agnostic (structurefree) methods, as witnessed both via numerical experiments and real data examples.
8.3.5 Stable vectorization of multiparameter persistent homology using signed barcodes as measures
Participant: Mathieu Carrière, David Loiseaux.
In collaboration with Luis Scoccola (Oxford University), Steve Oudot (Geomerix, Inria Saclay) and Magnus Botnan (Vrije Universiteit)
Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the oneparameter case – where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest – and there is now a wide array of methods enabling the use of oneparameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its oneparameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes – a recent family of MPH descriptors – as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to stateoftheart topologybased methods on various types of data.
8.3.6 A framework for fast and stable representations of multiparameter persistent homology decompositions
Participant: Mathieu Carrière, David Loiseaux.
In collaboration with Andrew Blumberg (Columbia University, NYC)
Topological data analysis (TDA) is an area of data science that focuses on using invariants from algebraic topology to provide multiscale shape descriptors for geometric data sets such as point clouds. One of the most important such descriptors is persistent homology, which encodes the change in shape as a filtration parameter changes; a typical parameter is the feature scale. For many data sets, it is useful to simultaneously vary multiple filtration parameters, for example feature scale and density. While the theoretical properties of single parameter persistent homology are well understood, less is known about the multiparameter case. In particular, a central question is the problem of representing multiparameter persistent homology by elements of a vector space for integration with standard machine learning algorithms. Existing approaches to this problem either ignore most of the multiparameter information to reduce to the oneparameter case or are heuristic and potentially unstable in the face of noise. In this article, we introduce a new general representation framework that leverages recent results on decompositions of multiparameter persistent homology. This framework is rich in information, fast to compute, and encompasses previous approaches. Moreover, we establish theoretical stability guarantees under this framework as well as efficient algorithms for practical computation, making this framework an applicable and versatile tool for analyzing geometric and point cloud data. We validate our stability results and algorithms with numerical experiments that demonstrate statistical convergence, prediction accuracy, and fast running times on several real data sets.
8.4 Algorithmic and Combinatorial Aspects of Low Dimensional Topology
8.4.1 An algorithm for TambaraYamagami quantum invariants of 3manifolds, parameterized by the first Betti number
Participant: Clément Maria.
In collaboration with Colleen Delaney (Berkeley), Eric Samperton (Purdue).
Quantum topology provides various frameworks for defining and computing invariants of manifolds. One such framework of substantial interest in both mathematics and physics is the TuraevViroBarrettWestbury state sum construction, which uses the data of a spherical fusion category to define topological invariants of triangulated 3manifolds via tensor network contractions. In this work 47 we consider a restricted class of state sum invariants of 3manifolds derived from TambaraYamagami categories. These categories are particularly simple, being entirely specified by three pieces of data: a finite abelian group, a bicharacter of that group, and a sign $\pm 1$. Despite being one of the simplest sources of state sum invariants, the computational complexities of TambaraYamagami invariants are yet to be fully understood. We make substantial progress on this problem. Our main result is the existence of a general fixed parameter tractable algorithm for all such topological invariants, where the parameter is the first Betti number of the 3manifold with $\mathbb{Z}/2\mathbb{Z}$ coefficients. We also explain that these invariants are sometimes #Phard to compute (and we expect that this is almost always the case). Contrary to other domains of computational topology, such as graphs on surfaces, very few hard problems in 3manifold topology are known to admit FPT algorithms with a topological parameter. However, such algorithms are of particular interest as their complexity depends only polynomially on the combinatorial representation of the input, regardless of size or combinatorial width. Additionally, in the case of Betti numbers, the parameter itself is easily computable in polynomial time.
8.4.2 Hard Diagrams of the Unknot
Participant: Clément Maria.
In collaboration with Benjamin Burton (University of Queensland), HsienChih Chang (Dartmouth College), Maarten Löffler (TU/e  Eindhoven University of Technology), Arnaud de Mesmay (LIGM  Laboratoire d'Informatique GaspardMonge), Saul Schleimer (University of Warwick), Eric Sedgwick (DePaul University), Jonathan Spreer (The University of Sydney).
We present three “hard” diagrams of the unknot 15. They require (at least) three extra crossings before they can be simplified to the trivial unknot diagram via Reidemeister moves in S2. Both examples are constructed by applying previously proposed methods. The proof of their hardness uses significant computational resources. We also determine that no small “standard” example of a hard unknot diagram requires more than one extra crossing for Reidemeister moves in S2.
8.5 Miscellaneous
8.5.1 Variational Shape Reconstruction via Quadric Error Metrics
Participant: David CohenSteiner.
In collaboration with Tong Zhao, Pierre Alliez (Inria team Titane), Laurent Busé (Inria team Aromath), Tamy Boubekeur and JeanMarc Thiery (Adobe Research)
Inspired by the strengths of quadric error metrics initially designed for mesh decimation, we propose a concise mesh reconstruction approach for 3D point clouds 32. Our approach proceeds by clustering the input points enriched with quadric error metrics, where the generator of each cluster is the optimal 3D point for the sum of its quadric error metrics. This approach favors the placement of generators on sharp features, and tends to equidistribute the error among clusters. We reconstruct the output surface mesh from the adjacency between clusters and a constrained binary solver. We combine our clustering process with an adaptive refinement driven by the error. Compared to prior art, our method avoids dense reconstruction prior to simplification and produces immediately an optimized mesh.
8.5.2 Two Lower Bounds for Random Point Sets via Negative Association
Participant: Marc Glisse.
In collaboration with Denys Bulavka (Charles University, Prague), Olivier Devillers (Inria team Gamble), Philippe Duchon (Laboratoire Bordelais de Recherche en Informatique) and Xavier Goaoc (Inria team Gamble)
We present 43 two lower bounds that hold with high probability for random point sets. We first give a new, and elementary, proof that the classical models of random point sets (uniform in a smooth convex body, uniform in a polygon, Gaussian) have a superconstant number of extreme points with high probability. We next prove that any algorithm that determines the orientation of all triples in a planar set of n points (that is, the order type of the point set) from their Cartesian coordinates must read with high probability $4nlognO(nloglogn)$ coordinate bits. This matches previously known upper bounds. Both bounds rely on a method due to Dubhashi and Ranjan (Random Structures and Algorithms, 1998) for obtaining concentration results via a negative association property.
8.5.3 Harmonic analysis on the boundary of hyperbolic groups
Participant: Georg Gruetzner.
In this paper 51 we show that a Möbiusstructure $\mathcal{M}$ of dimension $Q$ has a minimal AhlforsDavid constant. This shows that a Möbius space is uniformly $Q$AhlforsDavid regular. In summary, many classical theorems of harmonic analysis on ${\mathbb{R}}^{n}$ admit a Möbiusinvariant formulation in the context of Möbiusgeometry. We use this observation to show that the KnappStein operator
is a continuous operator on the weighted ${L}^{2}$space ${L}^{2}\left({\left(\frac{{d}^{\text{'}}}{d}\right)}^{\alpha}d{\mu}_{d}\right)$, with a norm independent of $d$ and ${d}^{\text{'}}$.
From here we construct a Sobolev space ${\mathscr{H}}_{d}^{\alpha}$ on $s$densities for a given $s$ as a function of $\alpha $. We would like to say that the construction is topologically independent of the metric $d$. In this paper we prove that the norms on a large class of functions are comparable.
The work is inspired by a paper by Astengo, Cowling, and Di Blasio, who construct uniformly bounded representations for simple Lie groups of rank 1. We formulate the problem in a much more general framework of groups acting on Möbius structures. In particular, all hyperbolic groups.
8.5.4 Constant regret for sequence prediction with limited advice
Participant: Gilles Blanchard.
In collaboration with El Mehdi Saad (CentraleSupelec, U. ParisSaclay)
We investigated in 30 the problem of cumulative regret minimization for individual sequence prediction with respect to the best expert in a finite family of size $K$ under limited access to information. We assume that in each round, the learner can predict using a convex combination of at most $p$ experts for prediction, then they can observe a posteriori the losses of at most $m$ experts. We assume that the loss function is rangebounded and expconcave. In the standard multiarmed bandits setting, when the learner is allowed to play only one expert per round and observe only its feedback, known optimal regret bounds are of the order $\mathcal{O}\left(\sqrt{KT}\right)$. We show that allowing the learner to play one additional expert per round and observe one additional feedback improves substantially the guarantees on regret. We provide a strategy combining only $p=2$ experts per round for prediction and observing $m\ge 2$ experts' losses. Its randomized regret (wrt. internal randomization of the learners' strategy) is of order $\mathcal{O}\left((K/m)log\left(K{\delta}^{1}\right)\right)$ with probability $1\delta $, i.e., is independent of the horizon $T$ (“constant” or “fast rate” regret) if ($p\ge 2$ and $m\ge 3$). We prove that this rate is optimal up to a logarithmic factor in $K$. In the case $p=m=2$, we provide an upper bound of order $\mathcal{O}({K}^{2}log\left(K{\delta}^{1}\right))$, with probability $1\delta $. Our strategies do not require any prior knowledge of the horizon $T$ nor of the confidence parameter $\delta $. Finally, we show that if the learner is constrained to observe only one expert feedback per round, the worstcase regret is the “slow rate” $\Omega \left(\sqrt{KT}\right)$, suggesting that synchronous observation of at least two experts per round is necessary to have a constant regret.
8.5.5 Flagfolds
Participant: Blanche Buet.
In collaboration with Xavier Pennec (INRIA Team Epione, Sophia Antipolis)
In 42, by interpreting the product of the Principal Component Analysis, that is the covariance matrix, as a sequence of nested subspaces naturally coming with weights according to the level of approximation they provide, we are able to embed all $d$–dimensional Grassmannians into a stratified space of covariance matrices. We observe that Grassmannians constitute the lowest dimensional skeleton of the stratification while it is possible to define a Riemaniann metric on the highest dimensional and dense stratum, such a metric being compatible with the global stratification. With such a Riemaniann metric at hand, it is possible to look for geodesics between two linear subspaces of different dimensions that do not go through higher dimensional linear subspaces as would euclidean geodesics. Building upon the proposed embedding of Grassmannians into the stratified space of covariance matrices, we generalize the concept of varifolds to what we call flagfolds in order to model multidimensional shapes.
9 Bilateral contracts and grants with industry
9.1 Bilateral contracts with industry

Participants: Alexandre Guerin, Frédéric Chazal.
Collaboration with Sysnav, a French SME with world leading expertise in navigation and geopositioning in extreme environments, on TDA, geometric approaches and machine learning for the analysis of movements of pedestrians and patients equipped with inertial sensors (CIFRE PhD of Alexandre Guérin).

Participants: Felix Hensel, Theo Lacombe, Marc Glisse, Mathieu Carrière, Frédéric Chazal.
Research collaboration with Fujitsu on the development of new TDA methods and tools for Machine learning and Artificial Intelligence (started in Dec 2017).

Participants: Bastien Dussap, Marc Glisse, Gilles Blanchard.
Research collaboration with MetaFora on the development of new TDAbased and statistical methods for the analysis of cytometric data (started in Nov. 2019).

Participants: David CohenSteiner.
Collaboration with Dassault Systèmes and Inria team Geomerix (Saclay) on the applications of methods from geometric measure theory to the modelling and processing of complex 3D shapes (PhD of Lucas Brifault, started in May 2022).
10 Partnerships and cooperations
10.1 International research visitors
10.1.1 Visits of international scientists
Other international visits to the team
Wolfgang Polonik

Status
researcher

Institution of origin:
UC Davis

Country:
USA

Dates:
December 2023

Context of the visit:
research stay, PhD jury

Mobility program/type of mobility:
research stay
10.1.2 Visits to international teams
Sabbatical programme
Clément Maria

Visited institution:
Escola de Matemática Aplicada de la Fondation Getúlio Vargas (Brésil)

Dates of the stay:
From Sat Oct 01 2022 to Sat Sep 30 2023

Summary of the stay:
This is a one year sabbatical stay whose main goal is to set up and reinforce new long term collaborations in computational and applied topology and topological data analysis.
Research stays abroad
Gilles Blanchard

Visited institution:
University of Potsdam (Germany)

Dates of the stay:
01/01/2023 to 31/07/2023

Summary of the stay:
This was a 7month research stay for collaboration with the Collaborative Research Center "Data Assumilation" of the University of Potsdam, Speaker: Prof. Sebastian Reich
10.2 National initiatives
10.2.1 ANR
ANR Chair in AI
Participants: Frédéric Chazal, Marc Glisse, Louis Pujol, Wojciech Riese.
 Acronym : TopAI
 Type : ANR Chair in AI.
 Title : Topological Data Analysis for Machine Learning and AI
 Coordinator : Frédéric Chazal
 Duration : 4 years from September 2020 to August 2024.
 Others Partners: Two industrial partners, the French SME Sysnav and the French startup MetaFora.
 Abstract:
The TopAI project aims at developing a worldleading research activity on topological and geometric approaches in Machine Learning (ML) and AI with a double academic and industrial/societal objective. First, building on the strong expertise of the candidate and his team in TDA, TopAI aims at designing new mathematically wellfounded topological and geometric methods and tools for Data Analysis and ML and to make them available to the data science and AI community through stateoftheart software tools. Second, thanks to already established close collaborations and the strong involvement of French industrial partners, TopAI aims at exploiting its expertise and tools to address a set of challenging problems with high societal and economic impact in personalized medicine and AIassisted medical diagnosis.
ANR ALGOKNOT
Participants: Clément Maria.
 Acronym : ALGOKNOT.
 Type : ANR Jeune Chercheuse Jeune Chercheur.
 Title : Algorithmic and Combinatorial Aspects of Knot Theory.
 Coordinator : Clément Maria.
 Duration : 2020 – 2025 (5 years).
 Abstract: The project AlgoKnot aims at strengthening our understanding of the computational and combinatorial complexity of the diverse facets of knot theory, as well as designing efficient algorithms and software to study their interconnections.
 See also: Clément Maria and ANR AlgoKnot.
ANR GeMfaceT
Participants: Blanche Buet.
 Acronym: GeMfaceT.
 Type: ANR JCJC CES 40 – Mathématiques
 Title: A bridge between Geometric Measure and Discrete Surface Theories
 Coordinator: Blanche Buet.
 Duration: 48 months, starting October 2021.
 Abstract: This project positions at the interface between geometric measure and discrete surface theories. There has recently been a growing interest in nonsmooth structures, both from theoretical point of view, where singularities occur in famous optimization problems such as Plateau problem or geometric flows such as mean curvature flow, and applied point of view where complex high dimensional data are no longer assumed to lie on a smooth manifold but are more singular and allow crossings, treestructures and dimension variations. We propose in this project to strengthen and expand the use of geometric measure concepts in discrete surface study and complex data modelling and also, to use those possible singular disrcete surfaces to compute numerical solutions to the aforementioned problems.
10.2.2 Collaboration with other national research institutes
IFPEN
Participants: Frédéric Chazal, Marc Glisse, Jisu Kim.
Research collaboration between DataShape and IFPEN on TDA applied to various problems issued from energy transition and sustainable mobility.
Confiance.ai / IRT SystemX
Participants: Frédéric Chazal.
Research collaboration on anomaly detection for multivariate time series using TDA and ML approaches.
10.3 Regional initiatives
Metafora
Participant: Gilles Blanchard, Bastien Dussap, Marc Glisse.
 Type : Paris Region PhD 2021.
 Title : Comparaison de données cytométriques.
The ÎledeFrance region funds a PhD thesis in collaboration with Metafora biosystems, a company specialized in the analysis of cells through their metabolism. Bastien Dussap is supervised by Gilles Blanchard and Marc Glisse and aims to compare blood samples using statistics.
11 Dissemination
11.1 Promoting scientific activities
11.1.1 Scientific events: organisation
Member of the organizing committees
 Blanche Buet is coorganizing the Harmonic Analysis seminar of the ANH team in Orsay.
11.1.2 Journal
Member of the editorial boards
 Gilles Blanchard was member of the following journal editorial boards: Annals of Statistics, Electronic Journal of Statistics, Bernoulli.
 Frédéric Chazal is a member of the following journal editorial boards: Discrete and Computational Geometry (Springer), Graphical Models (Elsevier).
 Frédéric Chazal is the EditorinChief of the Journal of Applied and Computational Topology (Springer).
 Clément Maria is coeditor of the CGTA Special Issue on algorithmic aspects of computational and applied topology.
11.1.3 Invited talks
 Blanche Buet gave talks in several team seminars: Math department of Mulhouse, Analyse/EDP seminar of UCLouvain (Belgium), séminaire du pôle analyse du CMAP (Palaiseau) and conferences: ParisLondon Analysis Seminar (UCL London, UK), Approximation Theory Workshop in FoCM 2023 (Paris, France), Offthegrid workshop (IHP, France).
 Blanche Buet gave a 3h–short course on varifolds in Aussois (France).
11.1.4 Leadership within the scientific community
 Frédéric Chazal is the Scientific Director of the DATAIA Institute at Université ParisSaclay.
 Frédéric Chazal is a member of the board of directors of the DIM project AI4IDF of the Région IledeFrance.
 Clément Maria is cohead (with Théo Lacombe) of the GT GeoAlgo within the GdR IM.
 Clément Maria represents INRIA in the Steering Committee of the QuantAzur Federative Institute on quantum technologies.
11.1.5 Research administration
 Pierre Pansu was deputy director of the FMJH until August.
 Marc Glisse is president of the CDT at Inria Saclay.
 Blanche Buet is member of the CCUPS (Commission Consultative de l'Université ParisSaclay), Laboratory council and "comité parité" of LMO.
 Blanche Buet is member of the FMJH postdoc selection committee.
11.2 Teaching  Supervision  Juries
11.2.1 Teaching
 Master: Frédéric Chazal, Analyse Topologique des Données, 30h eqTD, Université ParisSud, France.
 Master: Clément Maria, Computational Geometry Learning, 18h eqTD, M2, MPRI, France.
 Master: Frédéric Cazals and Mathieu Carrière, Foundations of Geometric Methods in Data Analysis, 24h eqTD, M2, École CentraleSupélec, France.
 Master: Frédéric Cazals and JeanDaniel Boissonnat and Mathieu Carrière, Geometric and Topological Methods in Machine Learning, 24h eqTD, M2, Université Côte d'Azur, France.
 Master: Frédéric Chazal and Julien Tierny, Topological Data Analysis, 38h eqTD, M2, Mathématiques, Vision, Apprentissage (MVA), ENS ParisSaclay, France.
 Master: Gilles Blanchard, Mathematics for Artificial Intelligence 1, 70h eqTD, IMO, Université ParisSaclay, France.
 Master: Blanche Buet, TDDistributions et analyse de Fourier, 60h eqTD, M1, Université ParisSaclay, France.
 Master: Marc Glisse, Conception et analyse d'algorithmes, 40h eqTD, M1, École Polytechnique, France.
 Master: Blanche Buet, Short Course Géométrie et approximation at the Rentrée des Masters 2023, Université ParisSaclay.
 Undergrad: Marc Glisse, Mécanismes de la programmation orientéeobjet, 40h eqTD, L3, École Polytechnique, France.
 Master: Nina Otter, Probabilités, 25h eqTD, M1 Mathématiques Fondamentales, Laboratoire de Mathématiques d'Orsay, France.
11.2.2 Supervision
 PhD: Vadim Lebovici, Laplace transform for constructible functions. Defended in September 2023. Steve Oudot and François Petit.
 PhD: Christophe Vuong, Random hypergraphs. Defended in December 2023. Laurent Decreusefond and Marc Glisse.
 PhD in progress: Bastien Dussap, Comparaison de données cytométriques, started October 1st, 2021, Gilles Blanchard and Marc Glisse.
 PhD: Olympio Hacquard, Apprentissage statistique par méthodes topologiques et géométriques, defended in September 2023, Gilles Blanchard and Clément Levrard.
 PhD in progress: Hannah Marienwald, Transfer learning in high dimension. Started September 2019. Gilles Blanchard and KlausRobert Müller.
 PhD in progress: JeanBaptiste Fermanian, Estimation de Kernel Mean Embedding et tests multiples en grande dimension. Started September 2021. Gilles Blanchard and Magalie FromontRenoir.
 PhD in progress: Antoine Commaret, Persistent Geometry. Started September 2021. David CohenSteiner and Indira Chatterji.
 PhD in progress: Lucas Brifault, Théorie de la mesure géométrique appliquée pour la modélisation de formes complexes. Started May 2022. David CohenSteiner and Mathieu Desbrun.
 PhD in progress: David Loiseaux, Multivariate topological data analysis for statistical machine learning. Started November 2021. Mathieu Carrière and Frédéric Cazals.
 PhD: Wojciech Reise, TDA for curve data. Defended in December 2023. Frédéric Chazal and Bertrand Michel.
 PhD in progress: Alexandre Guérin, Movement analysis from inertial sensors. Started on October 2021. Frédéric Chazal and Bertrand Michel.
 PhD in progress: Jérémie CapitaoMiniconi, deconvolution for singular measures with geometric support. Started on October 2020. Frédéric Chazal and Elisabeth Gassiat.
 PhD in progress: Charly Boricaud, Geometric inference for Data analysis: a Geometric Measure Theory perspective. Started on October 2021. Blanche Buet, Gian Paolo Leonardi et Simon Masnou.
 PhD in progress: Hugo Henneuse. Statistical Foundations of Topological Data Analysis for multidimensional random fields. Started on October 2022. Frédéric Chazal and Pascal Massart.
 PhD in progress: Laure Ferraris. Measuredependent metric learning and applications in Topological Data Analysis. Started on October 2022. Frédéric Chazal.
 PhD: Georg Grützner. Espaces de Möbius et géométrie à grande échelle. Defended in May 2023. Pierre Pansu.
 PhD: Henrique Ennes. Computational Complexity Foundations of Quantum Topology. Started on October 2023. Clément Maria and Nicolas Nisse (INRIA).
11.2.3 Juries
 PhD defense jury: Nina Otter, PhD defense of Pepijn Roos Hoefgeest (December 2023, Free University Amsterdam, supervisor: Magnus Botnan)
 PhD defense jury: Blanche Buet, PhD defense of Elise Bonhomme (December 2023, LMO, supervisor: François Babadjian)
11.3 Popularization
11.3.1 Interventions
 Clément Maria, c@féin INRIAUCA, December 2023
12 Scientific production
12.1 Major publications
 1 articleHomological Reconstruction and Simplification in R3.Computational Geometry2014HALDOI
 2 articleDelaunay Triangulation of Manifolds.Foundations of Computational Mathematics452017, 38HALDOI
 3 articleOnly distances are required to reconstruct submanifolds.Computational Geometry662017, 32  67HALDOI
 4 articleBuilding Efficient and Compact Data Structures for Simplicial Complexe.AlgorithmicaSeptember 2016HALDOI
 5 articleA Varifold Approach to Surface Approximation.Archive for Rational Mechanics and Analysis2262November 2017, 639694HALDOI
 6 articleA Sampling Theory for Compact Sets in Euclidean Space.Discrete Comput. Geom.4132009, 461479URL: http://dx.doi.org/10.1007/s0045400991448
 7 articleGeometric Inference for Measures based on Distance Functions.Foundations of Computational Mathematics116RR69302011, 733751HALDOI
 8 bookThe Structure and Stability of Persistence Modules.SpringerBriefs in MathematicsSpringer Verlag2016, VII, 116HAL
 9 articlePersistenceBased Clustering in Riemannian Manifolds.Journal of the ACM606November 2013, 38HAL
 10 articleVarianceMinimizing Transport Plans for Intersurface Mapping.ACM Transactions on Graphics362017, 14HALDOI
12.2 Publications of the year
International journals
 11 articleLocal Criteria for Triangulating General Manifolds.Discrete and Computational Geometry692023, 156191HALDOIback to text
 12 articleTracing Isomanifolds in R^d in Time Polynomial in d using Coxeter–Freudenthal–Kuhn Triangulations.SIAM Journal on Computing52April 2023, 452  486HALDOIback to text
 13 articleThe reach of subsets of manifolds.Journal of Applied and Computational Topology2023HALDOIback to text
 14 articleLocal characterizations for decomposability of 2parameter persistence modules.Algebras and Representation Theory2023HALDOIback to text
 15 articleHard Diagrams of the Unknot.Experimental MathematicsFebruary 2023, 119HALDOIback to text
 16 articleDeconvolution of spherical data corrupted with unknown noise..Electronic Journal of Statistics 171January 2023HALDOI
 17 articleQuantitative Stability of Barycenters in the Wasserstein Space.Probability Theory and Related FieldsOctober 2023HAL
 18 articleCovering families of triangles.Periodica Mathematica Hungarica872023, 86109HALDOI
 19 articleDelaunay and Regular Triangulations as Lexicographic Optimal Chains.Discrete and Computational Geometry70May 2023, 1–50HALDOI
 20 articleQuantitative Stability of Optimal Transport Maps under Variations of the Target Measure.Duke Mathematical Journal2023HAL
 21 articleHeat diffusion distance processes: a statistically founded method to analyze graph data sets.Journal of Applied and Computational TopologyMay 2023HALDOIback to text
 22 articleA Gradient Sampling Algorithm for Stratified Maps with Applications to Topological Data Analysis.Mathematical Programming2022023, 199–239HALDOI
 23 articleDiscrete Morse Theory for Computing Zigzag Persistence.Discrete and Computational GeometryNovember 2023, 538552HALDOIback to text

24
articleOn the persistent homology of almost surely
${C}^{0}$ stochastic processes.Journal of Applied and Computational TopologyJuly 2023HALDOIback to text  25 articlePost hoc false discovery proportion inference under a Hidden Markov Model.TestSeptember 2023HALDOIback to text
International peerreviewed conferences
 26 inproceedingsLabel Shift Quantification with Robustness Guarantees via Distribution Feature Matching.Machine Learning and Knowledge Discovery in Databases: Research TrackECMLPKDD 202314173Lecture Notes in Computer ScienceTurin (IT), ItalySpringer Nature SwitzerlandJune 2023, 6985HALDOIback to textback to text
 27 inproceedingsHausdorff and GromovHausdorff Stable Subsets of the Medial Axis.STOC 2023  55th Annual ACM Symposium on Theory of ComputingSTOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of ComputingOrlando (Florida), United StatesJune 2023, 1768–1776HALDOIback to text
 28 inproceedingsA Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions.NeurIPS 2023  36th Conference on Neural Information Processing SystemsAdvances in Neural Information Processing Systems 36New Orleans (LA), United StatesJune 2023HAL
 29 inproceedingsStable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures.NeurIPS 2023  36th Conference on Neural Information Processing SystemsAdvances in Neural Information Processing Systems 36New Orleans (LA), United StatesJune 2023HAL
 30 inproceedingsConstant regret for sequence prediction with limited advice.Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLRAlgorithmic Learning Theory (ALT 2023)201Singapore, SingaporeFebruary 2023, 13431386HALback to text
 31 inproceedingsCovariance Adaptive Best Arm Identification.Advances in Neural Information Processing SystemsNeurIPS 2023  Neural Information Processing Systems36New Orleans, United States2023HAL
 32 inproceedingsVariational Shape Reconstruction via Quadric Error Metrics.SIGGRAPH 2023  The 50th International Conference & Exhibition On Computer Graphics & Interactive TechniquesLos Angeles, United StatesAugust 2023HALDOIback to text
Scientific book chapters
 33 inbookNonasymptotic oneand twosample tests in high dimension with unknown covariance structure.PROMS. 425Foundations of Modern Statistics : Festschrift in Honor of Vladimir Spokoiny, Berlin, Germany, November 6–8, 2019, Moscow, Russia, November 30, 2019Springer Proceedings in Mathematics & StatisticsSpringer International PublishingJuly 2023, 121162HALDOI
Doctoral dissertations and habilitation theses
 34 thesisMöbius spaces and largescale geometry.Université ParisSaclayMay 2023HAL
 35 thesisFrom topological features to machine learning models : a journey through persistence diagrams.Université ParisSaclaySeptember 2023HAL
 36 thesisTwo complementary approaches in multiparameter persistence : intervaldecompositions and constructible functions.Université ParisSaclaySeptember 2023HAL
Reports & preprints
 37 miscTight bounds for the learning of homotopy à la Niyogi, Smale, and Weinberger for subsets of Euclidean spaces and of Riemannian manifolds.2022HALDOI
 38 miscMoment inequalities for sums of weakly dependent random fields.July 2023HAL
 39 miscSimplicial subdivision of simplices of arbitrary dimension in a space of constant nonzero curvature with bounded quality.November 2023HAL
 40 miscDimensionality Reduction for Persistent Homology with Gaussian Kernels.January 2023HAL
 41 miscTriangulating submanifolds: An elementary and quantified version of Whitney's method.2023HALDOI
 42 miscFlagfolds.May 2023HALback to text
 43 miscTwo Lower Bounds for Random Point Sets via Negative Association.2023HALback to text
 44 miscSupport and distribution inference from noisy data.April 2023HALback to text
 45 miscBurning or collapsing the medial axis is unstable.November 2023HAL
 46 miscChoosing the parameter of the Fermat distance: navigating geometry and noise.2023HALDOIback to text
 47 miscAn algorithm for TambaraYamagami quantum invariants of 3manifolds, parameterized by the first Betti number.November 2023HALback to text
 48 miscTransductive conformal inference with adaptive scores.October 2023HAL
 49 miscFast persistent homology computation for functions on ℝ.January 2023HALback to text
 50 miscAsymptoticMöbius maps.February 2023HAL
 51 miscHarmonic analysis on the boundary of hyperbolic groups.February 2023HALback to text
 52 miscStatistical learning on measures: an application to persistence diagrams.March 2023HAL
 53 miscEuler Characteristic Tools For Topological Data Analysis.March 2023HAL
 54 miscMAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks.May 2023HALback to text
 55 miscThe medial axis of closed bounded sets is Lipschitz stable with respect to the Hausdorff distance under ambient diffeomorphisms.November 2023HAL
 56 miscHausdorff and GromovHausdorff stable subsets of the medial axis.April 2023HALDOI
 57 miscFast, Stable and Efficient Approximation of Multiparameter Persistence Modules with MMA.June 2023HAL
 58 miscTopological signatures of periodiclike signals.June 2023HALback to text
Other scientific publications
12.3 Other