Section: New Results
Algorithmic foundations
Keywords: Computational geometry, computational topology, optimization, data analysis.
Making a stride towards a better understanding of the biophysical questions discussed in the previous sections requires various methodological developments discussed below.
Extracting the groupwise core structural connectivity network: bridging statistical and graph-theoretical approaches
Participant : D. Mazauric.
In collaboration with N. Lascano (Universidad de Buenos Aires, Argentina, Université Côte d'Azur, and Inria Sophia Antipolis - Méditerranée, EPI ATHENA), G. Gallardo (2. Université Côte d'Azur and Inria Sophia Antipolis - Méditerranée, EPI ATHENA), D. Wassermann (2).
Finding the common structural brain connectivity network for a given population is an open problem, crucial for current neuro-science. Recent evidence suggests there is a tightly connected network shared between humans. Obtaining this network will, among many advantages, allow us to focus cognitive and clinical analyses on common connections, thus increasing their statistical power. In turn, knowledge about the common network will facilitate novel analyses to understand the structure-function relationship in the brain. In [19], we present a new algorithm for computing the core structural connectivity network of a subject sample combining graph theory and statistics. Our algorithm works in accordance with novel evidence on brain topology. We analyze the problem theoretically and prove its complexity. Using 309 subjects, we show its advantages when used as a feature selection for connectivity analysis on populations, outperforming the current approaches.
Maximum flow under proportional delay constraint
Participant : D. Mazauric.
In collaboration with P. Bonami (LIF, UMR d'Aix-Marseille Université et du CNRS, and IBM ILOG CPLEX, Madrid), Y. Vaxès (LIF, UMR d'Aix-Marseille Université et du CNRS).
Network operators must satisfy some Quality of Service requirements for their clients. One of the most important parameters in telecommunication networks is the end-to-end delay of a unit of flow between a source node and a destination node. Given a network and a set of source destination pairs (connections), we consider in [14] the problem of maximizing the sum of the flow under proportional delay constraints. In this paper, the delay for crossing a link is proportional to the total flow crossing this link. If a connection supports non-zero flow, then the sum of the delays along any path corresponding to that connection must be lower than a given bound. The constraints of delay are on-off constraints because if a connection carries zero flow, then there is no constraint for that connection. The difficulty of the problem comes from the choice of the connections supporting non-zero flow. We first prove a general approximation ratio using linear programming for a variant of the problem. We then prove a linear time 2-approximation algorithm when the network is a path. We finally show a Polynomial Time Approximation Scheme when the graph of intersections of the paths has bounded treewidth.
Comparing two clusterings using matchings between clusters of clusters
Participants : F. Cazals, D. Mazauric, R. Tetley, R. Watrigant.
Clustering is a fundamental problem in data science, yet, the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory and various indices (Rand, Jaccard) have been developed. In this work [20], we go beyond these by providing a novel class of methods computing meta-clusters within each clustering– a meta-cluster is a group of clusters, together with a matching between these. Let the intersection graph of two clusterings be the edge-weighted bipartite graph in which the nodes represent the clusters, the edges represent the non empty intersection between two clusters, and the weight of an edge is the number of common items. We introduce the so-called D-family-matching problem on intersection graphs, with D the upper-bound on the diameter of the graph induced by the clusters of any meta-cluster. First we prove NP-completeness results and unbounded approximation ratio of simple strategies. Second, we design exact polynomial time dynamic programming algorithms for some classes of graphs (in particular trees). Then, we prove spanning-tree based efficient algorithms for general graphs. Our experiments illustrate the role of D as a scale parameter providing information on the relationship between clusters within a clustering and in-between two clusterings. They also show the advantages of our built-in mapping over classical cluster comparison measures such as the variation of information (VI).
The SBL
Participants : F. Cazals, T. Dreyfus.
Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library (SBL), a generic C++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances.
The SBL involves four software components (1–4 thereafter) [15]. For end-users, the SBL provides ready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, to deal with molecular flexibility, to model macro-molecular assemblies. These applications can also be combined to tackle integrated analysis problems. For developers, the SBL provides a broad C++ toolbox with modular design, involving core (2) algorithms, (3) biophysical models and (4) modules, the latter being especially suited to develop novel applications. The SBL comes with a thorough documentation consisting of user and reference manuals, and a bugzilla platform to handle community feedback.
The SBL is available from http://sbl.inria.fr
See also the section New Software and Platforms.