Section: New Results

Algorithmic aspects of topological and geometric data analysis

An Efficient Representation for Filtrations of Simplicial Complexes

Participant : Jean-Daniel Boissonnat.

In collaboration with Karthik C.S. (Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Israel)

A filtration over a simplicial complex K is an ordering of the simplices of K such that all prefixes in the ordering are subcomplexes of K. Filtrations are at the core of Persistent Homology, a major tool in Topological Data Analysis. In order to represent the filtration of a simplicial complex, the entire filtration can be appended to any data structure that explicitly stores all the simplices of the complex such as the Hasse diagram or the recently introduced Simplex Tree by Boissonnat and Maria [Algorithmica '14]. However, with the popularity of various computational methods that need to handle simplicial complexes, and with the rapidly increasing size of the complexes, the task of finding a compact data structure that can still support efficient queries is of great interest.

This direction has been recently pursued for the case of maintaining simplicial complexes. For instance, Boissonnat et al. [SoCG '15] considered storing the simplices that are maximal for the inclusion and Attali et al. [IJCGA '12] considered storing the simplices that block the expansion of the complex. Nevertheless, so far there has been no data structure that compactly stores the filtration of a simplicial complex, while also allowing the efficient implementation of basic operations on the complex.

In this work [22], we propose a new data structure called the Critical Simplex Diagram (CSD) which is a variant of our work on the Simplex Array List (SAL) introduced in [SoCG '15]. Our data structure allows to store in a compact way the filtration of a simplicial complex, and allows for the efficient implementation of a large range of basic operations. Moreover, we prove that our data structure is essentially optimal with respect to the requisite storage space. Next, we show that the CSD representation admits the following construction algorithms.

  • A new edge-deletion algorithm for the fast construction of Flag complexes, which only depends on the number of critical simplices and the number of vertices.

  • A new matrix-parsing algorithm to quickly construct the relaxed strong Delaunay complexes, depending only on the number of witnesses and the dimension of the complex.

Discretized Riemannian Delaunay triangulations

Participants : Mael Rouxel-Labbé, Mathijs Wintraecken, Jean-Daniel Boissonnat.

Anisotropic meshes are desirable for various applications, such as the numerical solving of partial differential equations and graphics. In [27], we introduce an algorithm to compute discrete approximations of Riemannian Voronoi diagrams on 2-manifolds. This is not straightforward because geodesics, shortest paths between points, and therefore distances cannot in general be computed exactly. Our implementation employs recent developments in the numerical computation of geodesic distances and is accelerated through the use of an underlying anisotropic graph structure. We give conditions that guarantee that our discrete Riemannian Voronoi diagram is combinatorially equivalent to the Riemannian Voronoi diagram and that its dual is an embedded triangulation, using both approximate geodesics and straight edges. Both the theoretical guarantees on the approximation of the Voronoi diagram and the implementation are new and provide a step towards the practical application of Riemannian Delaunay triangulations.

Efficient and Robust Persistent Homology for Measures

Participants : Frédéric Chazal, Steve Oudot.

In collaboration with M. Buchet (Tohoku University), D. Sheehy (Univ. Connecticut).

A new paradigm for point cloud data analysis has emerged recently, where point clouds are no longer treated as mere compact sets but rather as empirical measures. A notion of distance to such measures has been defined and shown to be stable with respect to perturbations of the measure. This distance can easily be computed pointwise in the case of a point cloud, but its sublevel-sets, which carry the geometric information about the measure, remain hard to compute or approximate. This makes it challenging to adapt many powerful techniques based on the Euclidean distance to a point cloud to the more general setting of the distance to a measure on a metric space. We propose an efficient and reliable scheme to approximate the topological structure of the family of sublevel-sets of the distance to a measure. We obtain an algorithm for approximating the persistent homology of the distance to an empirical measure that works in arbitrary metric spaces. Precise quality and complexity guarantees are given with a discussion on the behavior of our approach in practice [17].

Shallow Packings in Geometry

Participants : Kunal Dutta, Arijit Ghosh.

A merged paper with Ezra, Esther (School of Mathematics, Georgia Institute of Technology, Atlanta, U.S.A.)

We refine the bound on the packing number, originally shown by Haussler, for shallow geometric set systems. Specifically, let V be a finite set system defined over an n-point set X; we view V as a set of indicator vectors over the n-dimensional unit cube. A delta-separated set of V is a subcollection W, such that the Hamming distance between each pair u,vW is greater than δ, where δ>0 is an integer parameter. The δ-packing number is then defined as the cardinality of the largest δ-separated subcollection of V. Haussler showed an asymptotically tight bound of Θ((n/delta)d) on the δ-packing number if V has VC-dimension (or primal shatter dimension) d. We refine this bound for the scenario where, for any subset, X'X of size mn and for any parameter 1km, the number of vectors of length at most k in the restriction of V to X' is only O(md1kd-d1), for a fixed integer d>0 and a real parameter 1d1d (this generalizes the standard notion of bounded primal shatter dimension when d1=d). In this case when V is "k-shallow" (all vector lengths are at most k), we show that its δ-packing number is O(nd1kd-d1/δd), matching Haussler's bound for the special cases where d1=d or k=n. We present two proofs, the first is an extension of Haussler's approach, and the second extends the proof of Chazelle, originally presented as a simplification for Haussler's proof. [21]

  • A new tight upper bound for shallow-packings in δ-separated set systems of bounded primal shatter dimension.

On Subgraphs of Bounded Degeneracy in Hypergraphs

Participants : Kunal Dutta, Arijit Ghosh.

A k-uniform hypergraph has degeneracy bounded by d if every induced subgraph has a vertex of degree at most d. Given a k-uniform hypergraph H=(V(H),E(H)), we show there exists an induced subgraph of size at least

v V ( H ) min 1 , c k d + 1 d H ( v ) + 1 1 / ( k 1 ) ,

where ck=2(1+1k1)11k and dH(v) denotes the degree of vertex v in the hypergraph H. This extends and generalizes a result of Alon-Kahn-Seymour (Graphs and Combinatorics, 1987) for graphs, as well as a result of Dutta-Mubayi-Subramanian (SIAM Journal on Discrete Mathematics, 2012) for linear hypergraphs, to general k-uniform hypergraphs. We also generalize the results of Srinivasan and Shachnai (SIAM Journal on Discrete Mathematics, 2004) from independent sets (0-degenerate subgraphs) to d-degenerate subgraphs. We further give a simple non-probabilistic proof of the Dutta-Mubayi-Subramanian bound for linear k-uniform hypergraphs, which extends the Alon-Kahn-Seymour proof technique to hypergraphs. Our proof combines the random permutation technique of Bopanna-Caro-Wei (see e.g. The Probabilistic Method, N. Alon and J. H. Spencer; Dutta-Mubayi-Subramanian) and also Beame-Luby (SODA, 1990) together with a new local density argument which may be of independent interest. We also provide some applications in discrete geometry, and address some natural algorithmic questions. [28]

  • A new algorithmic lower bound for largest d-degenerate subgraphs in k-uniform hypergraphs.

A Simple Proof of Optimal Epsilon Nets

Participants : Kunal Dutta, Arijit Ghosh.

In collaboration with Nabil Mustafa (Université Paris-Est, Laboratoire d'Informatique Gaspard-Monge, ESIEE Paris, France.)

Showing the existence of ε-nets of small size has been the subject of investigation for almost 30 years, starting from the initial breakthrough of Haussler and Welzl (1987). Following a long line of successive improvements, recent results have settled the question of the size of the smallest ε-nets for set systems as a function of their so-called shallow-cell complexity.

In this paper we give a short proof of this theorem in the space of a few elementary paragraphs, showing that it follows by combining the ε-net bound of Haussler and Welzl (1987) with a variant of Haussler's packing lemma (1991).

This implies all known cases of results on unweighted ε-nets studied for the past 30 years, starting from the result of Matoušek, Seidel and Welzl (1990) to that of Clarkson and Varadajan (2007) to that of Varadarajan (2010) and Chan, Grant, Könemann and Sharpe (2012) for the unweighted case, as well as the technical and intricate paper of Aronov, Ezra and Sharir (2010). [40]

  • A new unified proof for all known bounds on unweighted ε-nets studied in the last 30 years.

Combinatorics of Set Systems with Small Shallow Cell Complexity: Optimal Bounds via Packings

Participants : Kunal Dutta, Arijit Ghosh.

In collaboration with Bruno Jartoux and Nabil Mustafa (Université Paris-Est Marne-la-Vallée, Laboratoire d'Informatique Gaspard-Monge, ESIEE Paris, France.)

The packing lemma of Haussler states that given a set system (X,R) with bounded VC dimension, if every pair of sets in R are `far apart' (i.e., have large symmetric difference), then R cannot contain too many sets. This has turned out to be the technical foundation for many results in geometric discrepancy using the entropy method as well as recent work on set systems with bounded VC dimension. Recently it was generalized to the shallow packing lemma [Dutta-Ezra-Ghosh SoCG 2015, Mustafa DCG 2016], applying to set systems as a function of their shallow cell complexity. In this paper we present several new results and applications related to packings:

  1. an optimal lower bound for shallow packings, thus settling the open question in Ezra (SODA 2014) and Dutta et al. (SoCG 2015),

  2. improved bounds on Mnets, providing a combinatorial analogue to Macbeath regions in convex geometry (Annals of Mathematics, 1952),

  3. simplifying and generalizing the main technical tool in Fox et al. (J. of the EMS, 2016).

Besides using the packing lemma and a combinatorial construction, our proofs combine tools from polynomial partitioning and the probabilistic method. [37]

  • A new optimal lower bound for shallow packings.

  • New improved bounds for M-nets - combinatorial analogs of Macbeath regions in convex geometry.

A new asymmetric correlation inequality for Gaussian measure

Participants : Kunal Dutta, Arijit Ghosh.

In collaboration with Nabil Mustafa (Université Paris-Est Marne-la-Vallée, Laboratoire d'Informatique Gaspard-Monge, ESIEE Paris, France.)

The Khatri-Šidák lemma says that for any Gaussian measure μ over n , given a convex set K and a slab L, both symmetric about the origin, one has μ(KL)μ(K)μ(L). We state and prove a new asymmetric version of the Khatri-Šidák lemma when K is a symmetric convex body and L is a slab (not necessarily symmetric about the barycenter of K). Our result also extends that of Szarek and Werner (1999), in a special case.

  • A new asymmetric inequality for gaussian measure. [38].