## Section: Scientific Foundations

### Modeling the Flexibility of Macro-molecules

Folding, docking, energy landscapes, induced fit, molecular dynamics, conformers, conformer ensembles, point clouds, reconstruction, shape learning, Morse theory

**Problems addressed.**
Proteins in vivo vibrate at various frequencies: high frequencies
correspond to small amplitude deformations of chemical bonds, while
low frequencies characterize more global deformations. This
flexibility contributes to the entropy thus the `free energy` of
the system *protein - solvent*. From the experimental standpoint,
NMR studies and Molecular Dynamics simulations generate ensembles of
conformations, called `conformers` .
Of particular interest while investigating flexibility is the notion
of correlated motion. Intuitively, when a protein is folded, all
atomic movements must be correlated, a constraint which gets
alleviated when the protein unfolds since the steric constraints get
relaxed (Assuming local forces are prominent, which in turn
subsumes electrostatic interactions are not prominent.).
Understanding correlations is of special interest to predict the
folding pathway that leads a protein towards its native state.
A similar discussion holds for the case of partners within a complex,
for example in the third step of the *diffusion - conformer
selection - induced fit* complex formation model.

Parameterizing these correlated motions, describing the corresponding energy landscapes, as well as handling collections of conformations pose challenging algorithmic problems.

**Methodological developments.**
At the side-chain level, the question of improving rotamer libraries
is still of interest [29] . This question is
essentially a clustering problem in the parameter space describing the
side-chains conformations.

At the atomic level, flexibility is essentially investigated resorting to methods based on a classical potential energy (molecular dynamics), and (inverse) kinematics. A molecular dynamics simulation provides a point cloud sampling the conformational landscape of the molecular system investigated, as each step in the simulation corresponds to one point in the parameter space describing the system (the conformational space) [45] . The standard methodology to analyze such a point cloud consists of resorting to normal modes. Recently, though, more elaborate methods resorting to more local analysis [41] , to Morse theory [36] and to analysis of meta-stable states of time series [37] have been proposed.

Given a sampling on an energy landscape, a number of fundamental issues actually arise: how does the point cloud describe the topography of the energy landscape (a question reminiscent from Morse theory)? Can one infer the effective number of degrees of freedom of the system over the simulation, and is this number varying? Answers to these questions would be of major interest to refine our understanding of folding and docking, with applications to the prediction of structural properties. It should be noted in passing that such questions are probably related to modeling phase transitions in statistical physics where geometric and topological methods are being used [40] .

From an algorithmic standpoint, such questions are reminiscent of
*shape learning*. Given a collection of samples on an (unknown) *model*, *learning* consists of guessing the model from the samples
—the result of this process may be called the *reconstruction*. In doing so, two types of guarantees are sought:
topologically speaking, the reconstruction and the model should
(ideally!) be isotopic; geometrically speaking, their Hausdorff
distance should be small.
Motivated by applications in Computer Aided Geometric Design, surface
reconstruction triggered a major activity in the Computational
Geometry community over the past ten years
[6] . Aside from applications, reconstruction
raises a number of deep issues:
the study of distance functions to the model and to the samples,
and their comparison [25] ;
the study of Morse-like constructions stemming from distance
functions to points [33] ;
the analysis of topological invariants of the model and the samples,
and their comparison [26] , [27] .

Last but not least, gaining insight on such questions would also help
to effectively select a reduced set of conformations best representing
a larger number of conformations. This selection problem is indeed
faced by flexible docking algorithms that need to maintain and/or
update collections of conformers for the second stage of the *diffusion - conformer selection - induced fit* complex formation
model.