## Section: New Results

### Modeling Macro-molecular Assemblies

Macro-molecular assembly, reconstruction by data integration, proteomics, modeling with uncertainties, curved Voronoi diagrams, topological persistence.

#### Connectivity Inference in Mass Spectrometry based Structure Determination

Participants : Frédéric Cazals, Deepesh Agarwal.

In collaboration with J. Araujo, and C. Caillouet, and D. Coudert, and S. Pérennes, from the COATI project-team (Inria-CNRS).

In [14] , we consider the following Minimum Connectivity Inference problem (MCI), which arises in structural biology: given vertex sets ${V}_{i}\subseteq V,i\in I$, find the graph $G=(V,E)$ minimizing the size of the edge set $E$, such that the sub-graph of $G$ induced by each ${V}_{i}$ is connected. This problem arises in structural biology, when one aims at finding the pairwise contacts between the proteins of a protein assembly, given the lists of proteins involved in sub-complexes. We present four contributions.

First, using a reduction of the set cover problem, we establish that
MCI is APX-hard.
Second, we show how to solve the problem to optimality using a mixed
integer linear programming formulation (`MILP` ).
Third, we develop a greedy algorithm based on union-find data
structures (`Greedy` ), yielding a $2\left({log}_{2}\right|V|+{log}_{2}\kappa )$-approximation, with $\kappa $ the maximum number of
subsets ${V}_{i}$ a vertex belongs to.
Fourth, application-wise, we use the MILP and the greedy heuristic to
solve the aforementioned connectivity inference problem in structural
biology.
We show that the solutions of `MILP` and `Greedy` are more parsimonious than
those reported by the algorithm initially developed in
biophysics, which are not qualified in terms of optimality. Since MILP
outputs a set of optimal solutions, we introduce the notion of
*consensus solution*. Using assemblies whose pairwise contacts
are known exhaustively, we show an almost perfect agreement between
the contacts predicted by our algorithms and the experimentally
determined ones, especially for consensus solutions.