Section:
New Results
Modeling Macro-molecular Assemblies
Macro-molecular assembly, reconstruction by data integration,
proteomics, modeling with uncertainties, curved Voronoi diagrams,
topological persistence.
Connectivity Inference in Mass Spectrometry based Structure Determination
Participants :
Frédéric Cazals, Deepesh Agarwal.
In collaboration with J. Araujo, and C. Caillouet, and D. Coudert, and
S. Pérennes, from the COATI project-team (Inria-CNRS).
In [14] , we consider the following Minimum Connectivity Inference problem (MCI), which arises in structural biology:
given vertex sets , find the graph minimizing
the size of the edge set , such that the sub-graph of induced by each
is connected.
This problem arises in structural biology, when one aims at finding
the pairwise contacts between the proteins of a protein assembly,
given the lists of proteins involved in sub-complexes.
We present four contributions.
First, using a reduction of the set cover problem, we establish that
MCI is APX-hard.
Second, we show how to solve the problem to optimality using a mixed
integer linear programming formulation (MILP ).
Third, we develop a greedy algorithm based on union-find data
structures (Greedy ), yielding a -approximation, with the maximum number of
subsets a vertex belongs to.
Fourth, application-wise, we use the MILP and the greedy heuristic to
solve the aforementioned connectivity inference problem in structural
biology.
We show that the solutions of MILP and Greedy are more parsimonious than
those reported by the algorithm initially developed in
biophysics, which are not qualified in terms of optimality. Since MILP
outputs a set of optimal solutions, we introduce the notion of
consensus solution. Using assemblies whose pairwise contacts
are known exhaustively, we show an almost perfect agreement between
the contacts predicted by our algorithms and the experimentally
determined ones, especially for consensus solutions.