**Computational Biology and Computational Structural Biology.**Understanding the lineage between species and the genetic drift of genes and genomes, apprehending the control and feed-back
loops governing the behavior of a cell, a tissue, an organ or a body, and inferring the relationship between the structure of biological (macro)-molecules and their functions are amongst the
major challenges of modern biology. The investigation of these challenges is supported by three types of data: genomic data, transcription and expression data, and structural data.

Genetic data feature sequences of nucleotides on DNA and RNA molecules, and are symbolic data whose processing falls in the realm of Theoretical Computer Science: dynamic programming,
algorithms on texts and strings, graph theory dedicated to phylogenetic problems. Transcription and expression data feature evolving concentrations of molecules (RNAs, proteins, metabolites)
over time, and fit in the formalism of discrete and continuous dynamical systems, and of graph theory. The exploration and the modeling of these data are covered by a rapidly expanding research
field termed
*systems biology*. Structural data encode informations about the
*folding*—the process through which a protein adopts its
*docking*—the process through which two or several molecules assemble. Folding and docking are driven by non covalent interactions, and for complex systems, are actually inter-twined
. Apart from the bio-physical interests raised by these processes, two
different application domains are concerned: in fundamental biology, one is primarily interested in understanding the machinery of the cell; in medicine, applications to drug design are
developed.

**Modeling in Computational Structural Biology.**Acquiring structural data is not always possible: NMR is restricted to relatively small molecules; membrane proteins do not crystallize, etc.
As a matter of fact, while the order of magnitude of the number of genomes sequenced is one thousand, the Protein Data Bank contains (a mere) 45,000 structures. (Because one gene may yield a
number of proteins through splicing, it is difficult to estimate the number of proteins from the number of genes. However, the latter is several orders of magnitudes beyond the former.) For
these reasons,
*molecular modeling*is expected to play a key role in investigating structural issues.

Ideally, bio-physical models of macro-molecules should resort to quantum mechanics. While this is possible for small systems, say up to 50 atoms, large systems are investigated within the framework of the Born-Oppenheimer approximation which stipulates the nuclei and the electron cloud can be decoupled. Example force fields developed in this realm are AMBER, CHARMM, OPLS. Of particular importance are Van der Waals models, where each atom is modeled by a sphere whose radius depends on the atom chemical type. From an historical perspective, Richards , and later Connolly , while defining molecular surfaces and developing algorithms to compute them, established the connexions between molecular modeling and geometric constructions. Remarkably, a number of difficult problems (e.g. additively weighted Voronoi diagrams) were touched upon in these early days.

The models developed in this vein are instrumental in investigating the interactions of molecules for which no structural data is available. But such models often fall short from providing
complete answers, which we illustrate with the folding problem. On one hand, as the conformations of side-chains belong to discrete sets (the so-called rotamers or rotational isomers)
, the number of distinct conformations of a poly-peptidic chain is
exponential in the number of amino-acids. On the other hand, Nature folds proteins within time scales ranging from milliseconds to hours, which is out of reach for simulations. The fact that
Nature avoids the exponential trap is known as Levinthal's paradox. The intrinsic difficulty of problems calls for models exploiting several classes of informations. For small systems,
*ab initio*models can be built from first principles. But for more complex systems,
*homology*or template-based models integrating a variable amount of knowledge acquired on similar systems are resorted to.

The variety of approaches developed are illustrated by the two community wide experiments CASP (
*Critical Assessment of Techniques for Protein Structure Prediction*;
http://
*Critical Assessment of Prediction of Interactions*;
http://

As illustrated by the previous discussion, modeling macro-molecules touches upon biology, physics and chemistry, as well as mathematics and computer science. In the following, we present the topics investigated within ABS.

Our key achievements in 2011 have been twofold.

First, following our work on Voronoi interface models
,
, one of our long-standing goals has been to provide a unified model for
atomic resolution protein interfaces. We took our Voronoi based modeling approach one step further, by developing a parametric model of protein binding patches, amenable to structure comparison
,
. This model may be seen as a parametric
*core-rim*model refining the classical binary core-rim model. It encompasses both geometric and topological properties, and allows the investigation of the topology of binding patches—a
dimension ignored so far. Moreover, the topological information also makes the model amenable to structure comparison, a topic hardly touched at the atomic level—the problem is in fact NP-hard.
This model is currently being used to perform a detailed analysis of antibody - antigen complexes, in the perspective of understanding the relationship between the amino-acid variability of
immunoglobulins, and their binding affinity.

Second, a recent achievement has been the design of an algorithm to compute so-called compoundly-weighted Voronoi diagram, in the context of TOleranced Models . Recall that the TOM framework is meant to accommodate uncertainties on the shapes and the positions of proteins within large protein assemblies. In 2011, we fully exploited the TOM framework to perform analysis on qualitative reconstructions of the Nuclear Pore Complex (NPC) , , the largest protein assembly known to date in the eukaryotic cell . This work was carried out in collaboration with V. Doye, from Inst. Jacques Monod, Paris, a renowned expert of the NPC.

We believe that the TOM framework and the accompanying statistics should prove of general interest for the problem of reconstructing macro-molecular assemblies and that of assessing such reconstructions.

The research conducted by ABSfocuses on two main directions in Computational Structural Biology (CSB), each such direction calling for specific algorithmic developments. These directions are:

- Modeling interfaces and contacts,

- Modeling the flexibility of macro-molecules.

**Problems addressed.**The Protein Data Bank,
http://
*interacting*with atoms of the second one. Understanding the structure of interfaces is central to understand biological complexes and thus the function of biological molecules
. Yet, in spite of almost three decades of investigations, the basic
principles guiding the formation of interfaces and accounting for its stability are unknown
. Current investigations follow two routes. From the experimental
perspective
, directed mutagenesis enables one to quantify the energetic importance
of residues, important residues being termed
*hot*residues. Such studies recently evidenced the
*modular*architecture of interfaces
. From the modeling perspective, the main issue consists of guessing the
hot residues from sequence and/or structural informations
.

The description of interfaces is also of special interest to improve
*scoring functions*. By scoring function, two things are meant: either a function which assigns to a complex a quantity homogeneous to a free energy change

**Methodological developments.**Describing interfaces poses problems in two settings: static and dynamic.

In the static setting, one seeks the minimalist geometric model providing a relevant bio-physical signal. A first step in doing so consists of identifying interface
atoms, so as to relate the geometry and the bio-chemistry at the interface level
. To elaborate at the atomic level, one seeks a structural alphabet
encoding the spatial structure of proteins. At the side-chain and backbone level, an example of such alphabet is that of
. At the atomic level and in spite of recent observations on the local
structure of the neighborhood of a given atom
, no such alphabet is known. Specific important local conformations are
known, though. One of them is the so-called dehydron structure, which is an under-desolvated hydrogen bond —a property that can be directly inferred from the spatial configuration of the

A structural alphabet at the atomic level may be seen as an alphabet featuring for an atom of a given type all the conformations this atom may engage into, depending on its neighbors. One
way to tackle this problem consists of extending the notions of molecular surfaces used so far, so as to encode multi-body relations between an atom and its neighbors
. In order to derive such alphabets, the following two strategies are
obvious. On one hand, one may use an encoding of neighborhoods based on geometric constructions such as Voronoi diagrams (affine or curved) or arrangements of balls. On the other hand, one may
resort to clustering strategies in higher dimensional spaces, as the

In the dynamic setting, one wishes to understand whether selected (hot) residues exhibit specific dynamic properties, so as to serve as anchors in a binding process . More generally, any significant observation raised in the static setting deserves investigations in the dynamic setting, so as to assess its stability. Such questions are also related to the problem of correlated motions, which we discuss next.

Large protein assemblies such as the Nuclear Pore Complex (NPC), chaperonin cavities, the proteasome or ATP synthases, to name a few, are key to numerous biological functions. To improve
our understanding of these functions, one would ideally like to build and animate atomic models of these molecular machines. However, this task is especially tough, due to their size and
their plasticity, but also due to the flexibility of the proteins involved. In a sense, the modeling challenges arising in this context are different from those faced for binary docking, and
also from those encountered for intermediate size complexes which are often amenable to a processing mixing (cryo-EM) image analysis and classical docking. To face these new challenges, an
emerging paradigm is that of reconstruction by data integration
. In a nutshell, the strategy is reminiscent from NMR and consists of
mixing experimental data from a variety of sources, so as to find out the model(s) best complying with the data. This strategy has been in particular used to propose plausible models of the
Nuclear Pore Complex
, the largest assembly known to date in the eukaryotic cell, and
consisting of 456 protein
*instances*of 30
*types*.

Reconstruction by data integration requires three ingredients. First, a parametrized model must be adopted, typically a collection of balls to model a protein with pseudo-atoms. Second, as
in NMR, a functional measuring the agreement between a model and the data must be chosen. In
, this functional is based upon
*restraints*, namely penalties associated to the experimental data. Third, an optimization scheme must be selected. The design of restraints is notoriously challenging, due to the
ambiguous nature and/or the noise level of the data. For example, Tandem Affinity Purification (TAP) gives access to a
*pullout*i.e. a list of protein types which are known to interact with one tagged protein type, but no information on the number of complexes or on the stoichiometry of proteins types
within a complex is provided. In cryo-EM, the envelope enclosing an assembly is often imprecisely defined, in particular in regions of low density. For immuno-EM labelling experiments,
positional uncertainties arise from the microscope resolution.

These uncertainties coupled with the complexity of the functional being optimized, which in general is non convex, have two consequences. First, it is impossible to
single out a unique reconstruction, and a set of plausible reconstructions must be considered. As an example, 1000 plausible models of the NPC were reported in
. Interestingly, averaging the positions of all balls of a particular
protein type across these models resulted in 30 so-called
*probability density maps*, each such map encoding the probability of presence of a particular protein type at a particular location in the NPC. Second, the assessment of all models
(individual and averaged) is non trivial. In particular, the lack of straightforward statistical analysis of the individual models and the absence of assessment for the averaged models are
detrimental to the mechanistic exploitation of the reconstruction results. At this stage, such models therefore remain qualitative.

As outlined by the previous discussion, a number of methodological developments are called for. On the experimental side, the problem of fostering the interpretation of data is under scrutiny. Of particular interest is the disambiguation of proteomics signals (TAP data, mass spectrometry data), and that of density maps coming from electron microscopy. As for modeling, two classes of developments are particularly stimulating. The first one is concerned with the design of algorithms performing reconstruction by data integration. The second one encompasses assessment tools, in order to single out the reconstructions which best comply with the experimental data.

**Problems addressed.**Proteins in vivo vibrate at various frequencies: high frequencies correspond to small amplitude deformations of chemical bonds, while low frequencies characterize more
global deformations. This flexibility contributes to the entropy thus the
`free energy`of the system
*protein - solvent*. From the experimental standpoint, NMR studies and Molecular Dynamics simulations generate ensembles of conformations, called
`conformers`. Of particular interest while investigating flexibility is the notion of correlated motion. Intuitively, when a protein is folded, all atomic movements must be correlated, a
constraint which gets alleviated when the protein unfolds since the steric constraints get relaxed
*diffusion - conformer selection - induced fit*complex formation model.

Parameterizing these correlated motions, describing the corresponding energy landscapes, as well as handling collections of conformations pose challenging algorithmic problems.

**Methodological developments.**At the side-chain level, the question of improving rotamer libraries is still of interest
. This question is essentially a clustering problem in the parameter
space describing the side-chains conformations.

At the atomic level, flexibility is essentially investigated resorting to methods based on a classical potential energy (molecular dynamics), and (inverse) kinematics. A molecular dynamics simulation provides a point cloud sampling the conformational landscape of the molecular system investigated, as each step in the simulation corresponds to one point in the parameter space describing the system (the conformational space) . The standard methodology to analyze such a point cloud consists of resorting to normal modes. Recently, though, more elaborate methods resorting to more local analysis , to Morse theory and to analysis of meta-stable states of time series have been proposed.

Given a sampling on an energy landscape, a number of fundamental issues actually arise: how does the point cloud describe the topography of the energy landscape (a question reminiscent from Morse theory)? Can one infer the effective number of degrees of freedom of the system over the simulation, and is this number varying? Answers to these questions would be of major interest to refine our understanding of folding and docking, with applications to the prediction of structural properties. It should be noted in passing that such questions are probably related to modeling phase transitions in statistical physics where geometric and topological methods are being used .

From an algorithmic standpoint, such questions are reminiscent of
*shape learning*. Given a collection of samples on an (unknown)
*model*,
*learning*consists of guessing the model from the samples —the result of this process may be called the
*reconstruction*. In doing so, two types of guarantees are sought: topologically speaking, the reconstruction and the model should (ideally!) be isotopic; geometrically speaking, their
Hausdorff distance should be small. Motivated by applications in Computer Aided Geometric Design, surface reconstruction triggered a major activity in the Computational Geometry community over
the past ten years
. Aside from applications, reconstruction raises a number of deep
issues: the study of distance functions to the model and to the samples, and their comparison
; the study of Morse-like constructions stemming from distance functions
to points
; the analysis of topological invariants of the model and the samples,
and their comparison
,
.

Last but not least, gaining insight on such questions would also help to effectively select a reduced set of conformations best representing a larger number of conformations. This selection
problem is indeed faced by flexible docking algorithms that need to maintain and/or update collections of conformers for the second stage of the
*diffusion - conformer selection - induced fit*complex formation model.

This section briefly comments on all the software distributed by ABS. On the one hand, the software released in 2011 is briefly described as the context is presented in the sections dedicated to new results. On the other hand, the software made available before 2011 is briefly specified in terms of applications targeted.

In any case, the web page advertising a given software also makes related publications available.

**Context.**Our work on the problem of modeling and comparing atomic resolution protein interfaces has been discussed in sections
and
The programs undertaking these two tasks are respectively named
`vorpatch`and
`compatch`.

**Distribution.**Binaries for
`vorpatch`and
`compatch`are available from
http://cgal.inria.fr/abs/vorpatch-compatch/.

**Context.**Our TOleranced Model framework has been described in sections
and
. The corresponding software package includes programs to (i) perform the
segmentation of (probability) density maps, (ii) construct toleranced models, (iii) explore toleranced models (geometrically and topologically), (iv) compute Maximal Common Induced Sub-graphs
(MCIS) and Maximal Common Edge Sub-graphs (MCES) to assess the pairwise contacts encoded in a TOM.

**Distribution.**Binaries for the aforementioned programs are made available from
http://cgal.inria.fr/abs/voratom/.

**Context.**Given a snapshot of a molecular dynamics simulation, a classical problem consists of
*quenching*that structure—minimizing the potential energy of the solute together with selected layers of solvent molecules. The program
`wsheller`provides a solution to the water layer selection, and incorporates a topological control of the layers selected.

**Distribution.**Binaries for
`wsheller`are available from
http://cgal.inria.fr/abs/wsheller/.

In collaboration with S. Loriot, from the Geometry Factory.

**Context.**Modeling the interfaces of macro-molecular complexes is key to improve our understanding of the stability and specificity of such interactions. We proposed a simple
parameter-free model for macro-molecular interfaces, which enables a multi-scale investigation —from the atomic scale to the whole interface scale. Our interface model improves the
state-of-the-art to (i) identify interface atoms, (ii) define interface patches, (iii) assess the interface curvature, (iv) investigate correlations between the interface geometry and water
dynamics / conservation patterns / polarity of residues.

**Distribution.**The following web site
http://cgal.inria.fr/abs/Intervorserves two
purposes: on the one hand, calculations can be run from the web site; on the other hand, binaries are distributed for Linux. To the best of our knowledge, this software is the only publicly
available one for analyzing Voronoi interfaces in macro-molecular complexes.

In collaboration with S. Loriot, from the Geometry Factory.

**Context.**Molecular surfaces and volumes are paramount to molecular modeling, with applications to electrostatic and energy calculations, interface modeling, scoring and model
evaluation, pocket and cavity detection, etc. However, for molecular models represented by collections of balls (Van der Waals and solvent accessible models), such calculations are
challenging in particular regarding numerics. Because all available programs are overlooking numerical issues, which in particular prevents them from qualifying the accuracy of the results
returned, we developed the first certified algorithm, called
`vorlume`. This program is based on so-called certified predicates to guarantee the branching operations of the program, as well as interval arithmetic to return an interval certified
to contain the exact value of each statistic of interest—in particular the exact surface area and the exact volume of the molecular model processed.

**Distribution.**Binaries for
`Vorlume`is available from
http://cgal.inria.fr/abs/Vorlume.

In collaboration with S. Loriot (the Geometry Factory), and J. Bernauer, from the EPI AMIB.

**Context.**The ESBTL (Easy Structural Biology Template Library) is a lightweight C++ library that allows the handling of PDB data and provides a data structure suitable for geometric
constructions and analyses.

**Distribution.**The source C++ code is available from
http://esbtl.sourceforge.net/.

In collaboration with N. Yanev, University of Sofia, and IMI at Bulgarian Academy of Sciences, Bulgaria, and R. Andonov, INRIA Rennes - Bretagne Atlantique, and IRISA/University of Rennes 1, France.

**Context.**Structural similarity between proteins provides significant insights about their functions. Maximum Contact Map Overlap maximization (CMO) received sustained attention during
the past decade and can be considered today as a credible protein structure measure. The solver
`A_purva`is an exact CMO solver that is both efficient (notably faster than the previous exact algorithms), and reliable (providing accurate upper and lower bounds of the solution).
These properties make it applicable for large-scale protein comparison and classification.

**Distribution.**The software is available from
http://apurva.genouest.org.

In collaboration with A. Bansal, former summer intern from IIT Bombay.

Understanding the specificity of protein interactions is a central question in structural biology, whence the importance of models for protein binding patches—a patch refers to the collection of atoms of a given partner accounting for the interaction. To improve our understanding of the relationship between the structure of binding patches and the biological function of protein complexes, we present a binding patch model decoupling the topological and geometric properties . While the geometry is classically encoded by the 3D positions of the atoms, the topology is recorded in a graph encoding the relative position of concentric shells partitioning the interface atoms. The topological - geometric duality provides the basis of a generic dynamic programming based algorithm to compare patches, which is instantiated to respectively favor topological or geometric comparisons.

On the biological side, using a dataset of 92 co-crystallized structures organized in biological sub-families, we exploit our encoding and the two comparison algorithms in two directions. First, we show that Nature enjoyed the topological and geometric degrees of freedom independently while retaining a finite set of qualitatively distinct topological signatures, and show that topological similarity is a less stringent notion that the ubiquitously used geometric similarity. Second, we analyze the topological and geometric coherence of binding patches within sub-families and across the whole database, and show that complexes related to the same biological function can encompass geometrically distinct shapes. Previous work on binding patches focused on the investigation of correlations between structural parameters and biochemical properties on the one hand, and on structural comparison algorithms on the other hand. We believe that the abstraction coded by the topological - geometric duality paves the way to new classifications, in particular in the context of flexible docking.

The corresponding software is presented in section .

In collaboration with Valérie Doye, Institut Jacques Monod, Paris.

Reconstruction by data integration is an emerging trend to reconstruct large protein assemblies, but uncertainties on the input data yield average models whose quantitative interpretation is challenging. This paper presents methods to probe fuzzy models of large assemblies against atomic resolution models of sub-systems.

Consider a Toleranced Model (TOM) of a macro-molecular assembly, namely a continuum of nested shapes representing the assembly at multiple scales. Also consider a template namely an atomic resolution 3D model of a sub-system of this assembly—also called a complex. We present algorithms performing a multi-scale assessment of the complexes of the TOM, by comparing the pairwise contacts which appear in the TOM against those of the template. These operations reduce to the comparison of graphs, which we perform by computing Maximal Common Induced Sub-graphs (MCIS) and Maximal Common Edge Sub-graphs (MCES).

We apply this machinery to recent average models of the NPC. First, we show how our contact analysis allows assessing the quality of probability density maps. Regarding particular sub-systems of the NPC, we focus on the Y-complex and on the T-complex. In particular for the latter, our analysis suggests a new 3D template of pairwise contacts.

We believe that these tools should become standard to assess the reconstruction of fuzzy assemblies.

The software associated to these developments is presented in section .

In collaboration with R. Andonov (IRISA), M. Le Boudic-Jamin (IRISA) and P. Kamath (former summer intern within the Symbioseproject at IRISA).

The 3D structure of macro-molecules underpins all biological functions. Similarities between protein structures may come from evolutionary relationships, and similar protein structures relate to similar functions.

The exponential growth of the number of known protein structures in the Protein Data Bank over the past decade led to the problem of protein classification. We mean here how to
automatically insert new protein structures into an already existing classified database

There are computational pitfalls in the FIP . The number of similarity scores

In and , we propose a notion of dominance between the protein structure comparison instances that allows the computation of optimal FIP without optimally solving all the comparison instances, and thus reduces the effect of the NP-Hardness of the similarity score.

Consider a protein complex involving two partners, the receptor and the ligand. In , we address the problem of comparing their binding patches, i.e. the sets of atoms accounting for their interaction. This problem has been classically addressed by searching quasi-isometric subsets of atoms within the patches, a task equivalent to a maximum clique problem, a NP-hard problem, so that practical binding patches involving up to 300 atoms cannot be handled. We extend previous work in two directions. First, we present a generic encoding of shapes represented as cell complexes. We partition a shape into concentric shells, based on the shelling order of the cells of the complex. The shelling order yields a shelling tree encoding the geometry and the topology of the shape. Second, for the particular case of cell complexes representing protein binding patches, we present three novel shape comparison algorithms. These algorithms combine a Tree Edit Distance calculation (TED) on shelling trees, together with Edit operations respectively favoring a topological or a geometric comparison of the patches. We show in particular that the geometric TED calculation strikes a balance, in terms of accuracy and running time between a purely geometric and topological comparisons, and we briefly comment on the biological findings reported in a companion paper .

Title: Computational Geometric Learning (CGL)

Type: COOPERATION (ICT)

Defi: FET Open

Instrument: Specific Targeted Research Project (STREP)

Duration: November 2010 - October 2013

Coordinator: Friedrich-Schiller-Universität Jena (Germany)

Others partners: Jena Univ. (coord.), INRIA (Geometrica Sophia, Geometrica Saclay, ABS), Tech. Univ. of Dortmund, Tel Aviv Univ., Nat. Univ. of Athens, Univ. of Groningen, ETH Zürich, Freie Univ. Berlin.

See also: http://cglearning.eu/

Abstract:
*The Computational Geometric Learning project aims at extending the success story of geometric algorithms with guarantees to high-dimensions. This is not a straightforward task. For
many problems, no efficient algorithms exist that compute the exact solution in high dimensions. This behavior is commonly called the curse of dimensionality. We try to address the
curse of dimensionality by focusing on inherent structure in the data like sparsity or low intrinsic dimension, and by resorting to fast approximation algorithms.*

Shah Pararth (from May 2011 until July 2011)

Subject: Geometric optimization algorithms for collections of balls.

Institution: IIT Bombay (India)

Deepankar Reddy (from May 2011 until July 2011)

Subject: On relaxation techniques for the maximum clique problem.

Institution: IIT Bombay (India)

– F. Cazals was member of the following PC:

Symposium on Geometry Processing.

SIAM Conference on Geometric and Physical Modeling.

International conference on Pattern Recognition in Bioinformatics.

– F.Cazals acted as
*rapporteur*of the following habilitation defense:

Dave Ritchie, University of Nancy, April 2011,
*Rapporteur*. Habilitation memoir on
*High performance algorithms for molecular shape recognition*.

– F. Cazals is member of the scientific committee of
*GDR Bio-informatique-Moléculaire*, in charge of activities related to computational structural biology.

– F. Cazals is member of the scientific committee of the exposition
*Leonard de Vinci: la Nature et l'Invention*, Cité des Sciences.

– Until September 2011, F. Cazals was coordinating, together with Pierre Kornprobst, the Master of Science in Computational Biology and Medicine, http://cbb.unice.fr.

**(Master)**Ecole Centrale Paris, France, 3rd year of the engineering curriculum in applied mathematics. Course on
*Geometric and topological modeling with applications in biophysics*, taught by F. Cazals (24h).

**(Master)**Université de Nice Sophia Antipolis, France, Master of Science in Computational Biology (second year). Course on
*Algorithmic Problems in Computational Structural Biology*, taught by F. Cazals (24h).

**(PhD thesis, defended)**T. Dreyfus,
*Assessing the Reconstruction of Macro-molecular Assemblies: the Example of the Nuclear Pore Complex*, Université de Nice Sophia Antipolis, defended on December the 20th. Advisor: F.
Cazals.

**(PhD thesis, ongoing)**C. Roth,
*Modeling the flexibility of macro-molecules: theory and applications*, Université de Nice Sophia Antipolis. Advisor: F. Cazals.

**(PhD thesis, ongoing)**A. Lheritier,
*Scoring and discriminating in high-dimensional spaces: a geometric based approach of statistical tests*, Université de Nice Sophia Antipolis. Advisor: F. Cazals.

**(PhD thesis, ongoing)**D. Agarwal,
*Towards nano-molecular design: advanced algorithms for modeling large protein assemblies*, Univ. of Nice - Sophia-Antipolis. Advisor: F. Cazals.