**Computational Biology and Computational Structural Biology.**
Understanding the lineage between species and the genetic drift of
genes and genomes, apprehending the control and feed-back loops
governing the behavior of a cell, a tissue, an organ or a body, and
inferring the relationship between the structure of biological
(macro)-molecules and their functions are amongst the major challenges
of modern biology. The investigation of these challenges is
supported by three types of data: genomic data, transcription and
expression data, and structural data.

Genetic data feature sequences of nucleotides on DNA and RNA
molecules, and are symbolic data whose processing falls in the realm
of Theoretical Computer Science: dynamic programming, algorithms on
texts and strings, graph theory dedicated to phylogenetic problems.
Transcription and expression data feature evolving concentrations of
molecules (RNAs, proteins, metabolites) over time, and fit in the formalism of
discrete and continuous dynamical systems, and of graph theory. The
exploration and the modeling of these data are covered by a rapidly
expanding research field termed *systems biology*.
Structural data encode informations about the 3D structures of
molecules (nucleic acids (DNA, RNA), proteins, small molecules) and their
interactions, and come from three main sources: X ray
crystallography, NMR spectroscopy, cryo Electron Microscopy.
Ultimately, structural data should expand our understanding of how the
structure accounts for the function of macro-molecules —one of the
central questions in structural biology. This goal actually subsumes
two equally difficult challenges, which are *folding* —the
process through which a protein adopts its 3D structure, and *docking* —the process through which two or several molecules
assemble. Folding and docking are driven by non covalent interactions,
and for complex systems, are actually inter-twined
.
Apart from the bio-physical interests raised by these processes, two
different application domains are concerned: in fundamental biology,
one is primarily interested in understanding the machinery of the
cell; in medicine, applications to drug design are developed.

**Modeling in Computational Structural Biology.**
Acquiring structural data is not always possible: NMR is restricted to
relatively small molecules; membrane proteins do not crystallize, etc.
As a matter of fact, the order of magnitude of the number of
genomes sequenced is of the order of one thousand, which results in
circa one million of genes recorded in the manually curated
Swiss-Prot database.
On the other hand, the Protein Data Bank contains circa 90,000
structures. Thus, the paucity of structures with respect to the known
number of genes calls for modeling in structural biology, so as to
foster our understanding of the structure-to-function relationship.

Ideally, bio-physical models of macro-molecules should resort to quantum mechanics. While this is possible for small systems, say up to 50 atoms, large systems are investigated within the framework of the Born-Oppenheimer approximation which stipulates the nuclei and the electron cloud can be decoupled. Example force fields developed in this realm are AMBER, CHARMM, OPLS. Of particular importance are Van der Waals models, where each atom is modeled by a sphere whose radius depends on the atom chemical type. From an historical perspective, Richards , and later Connolly , while defining molecular surfaces and developing algorithms to compute them, established the connexions between molecular modeling and geometric constructions. Remarkably, a number of difficult problems (e.g. additively weighted Voronoi diagrams) were touched upon in these early days.

The models developed in this vein are instrumental in investigating
the interactions of molecules for which no structural data is
available. But such models often fall short from providing complete
answers, which we illustrate with the folding problem. On one hand, as
the conformations of side-chains belong to discrete sets (the
so-called rotamers or rotational isomers) ,
the number of distinct conformations of a poly-peptidic chain is
exponential in the number of amino-acids. On the other hand, Nature
folds proteins within time scales ranging from milliseconds to hours,
while time-steps used in molecular dynamics simulations are of the
order of the femto-second, so that biologically relevant time-scales
are out reach for simulations. The fact that Nature avoids the
exponential trap is known as Levinthal's paradox.
The intrinsic difficulty of problems calls for models exploiting
several classes of informations. For small systems, *ab initio*
models can be built from first principles. But for more complex
systems, *homology* or template-based models integrating a variable amount of
knowledge acquired on similar systems are resorted to.

The variety of approaches developed are illustrated by the two
community wide experiments CASP (*Critical Assessment of Techniques
for Protein Structure Prediction*; http://*Critical Assessment of Prediction of Interactions*;
http://

As illustrated by the previous discussion, modeling macro-molecules touches upon biology, physics and chemistry, as well as mathematics and computer science. In the following, we present the topics investigated within ABS .

The research conducted by ABS focuses on three main directions in Computational Structural Biology (CSB), together with the associated methodological developments:

– Modeling interfaces and contacts,

– Modeling macro-molecular assemblies,

– Modeling the flexibility of macro-molecules,

– Algorithmic foundations.

**Keywords:** Docking, interfaces, protein complexes, structural alphabets,
scoring functions, Voronoi diagrams, arrangements of balls.

The Protein Data Bank, http://*interacting* with atoms of the second
one. Understanding the structure of interfaces is central to
understand biological complexes and thus the function of biological
molecules . Yet, in spite of almost three decades
of investigations, the basic principles guiding the formation of
interfaces and accounting for its stability are unknown
. Current investigations follow two routes.
From the experimental perspective , directed
mutagenesis enables one to quantify the energetic importance of
residues, important residues being termed *hot* residues. Such
studies recently evidenced the *modular* architecture of
interfaces
.
From the modeling perspective, the main issue consists of guessing the
hot residues from sequence and/or structural informations
.

The description of interfaces is also of special interest to improve
*scoring functions*. By scoring function, two things are meant:
either a function which assigns to a complex a quantity homogeneous to
a free energy change

Describing interfaces poses problems in two settings: static and dynamic.

In the static setting, one seeks the minimalist geometric model
providing a relevant bio-physical signal. A first step in doing so
consists of identifying interface atoms, so as to relate the geometry and
the bio-chemistry at the interface level .
To elaborate at the atomic level, one seeks a structural alphabet
encoding the spatial structure of proteins. At the side-chain and
backbone level, an example of such alphabet is that of
. At the atomic level and in spite of recent
observations on the local structure of the neighborhood of a given
atom , no such alphabet is known. Specific
important local conformations are known, though. One of them is the
so-called dehydron structure, which is an under-desolvated hydrogen
bond —a property that can be directly inferred from the spatial
configuration of the

In the dynamic setting, one wishes to understand whether selected (hot) residues exhibit specific dynamic properties, so as to serve as anchors in a binding process . More generally, any significant observation raised in the static setting deserves investigations in the dynamic setting, so as to assess its stability. Such questions are also related to the problem of correlated motions, which we discuss next.

**Keywords:** Macro-molecular assembly, reconstruction by data
integration, proteomics, modeling with uncertainties, curved Voronoi
diagrams, topological persistence.

Large protein assemblies such as the Nuclear Pore Complex (NPC),
chaperonin cavities, the proteasome or ATP synthases, to name a few,
are key to numerous biological functions. To improve our
understanding of these functions, one would ideally like to build and
animate atomic models of these molecular machines. However, this task
is especially tough, due to their size and their plasticity, but also
due to the flexibility of the proteins involved.
In a sense, the modeling challenges arising in this context are
different from those faced for binary docking, and also from those
encountered for intermediate size complexes which are often amenable
to a processing mixing (cryo-EM) image analysis and classical docking.
To face these new challenges, an emerging paradigm is that of
reconstruction by data integration . In a
nutshell, the strategy is reminiscent from NMR and consists of mixing
experimental data from a variety of sources, so as to find out the
model(s) best complying with the data.
This strategy has been in particular used to propose plausible models
of the Nuclear Pore Complex , the largest assembly
known to date in the eukaryotic cell, and consisting of 456 protein
*instances* of 30 *types*.

Reconstruction by data integration requires three ingredients. First,
a parametrized model must be adopted, typically a collection of balls
to model a protein with pseudo-atoms. Second, as in NMR, a functional
measuring the agreement between a model and the data must be
chosen. In , this functional is based upon *restraints*, namely penalties associated to the experimental data.
Third, an optimization scheme must be selected.
The design of restraints is notoriously challenging, due to the
ambiguous nature and/or the noise level of the data.
For example, Tandem Affinity Purification (TAP) gives access to a *pullout* i.e. a list of protein types which are known to interact
with one tagged protein type, but no information on the number of
complexes or on the stoichiometry of proteins types within a complex
is provided.
In cryo-EM, the envelope enclosing an assembly is often imprecisely
defined, in particular in regions of low density. For immuno-EM
labelling experiments, positional uncertainties arise from the
microscope resolution.

These uncertainties coupled with the complexity of the functional
being optimized, which in general is non convex, have two
consequences.
First, it is impossible to single out a unique reconstruction, and a
set of plausible reconstructions must be considered. As an example,
1000 plausible models of the NPC were reported in
. Interestingly, averaging the positions of all
balls of a particular protein type across these models resulted in 30
so-called *probability density maps*, each such map encoding the
probability of presence of a particular protein type at a particular
location in the NPC.
Second, the assessment of all models (individual and averaged) is non
trivial. In particular, the lack of straightforward statistical
analysis of the individual models and the absence of assessment for
the averaged models are detrimental to the mechanistic exploitation of
the reconstruction results. At this stage, such models therefore
remain qualitative.

**Keywords:** Folding, docking, energy landscapes, induced fit,
molecular dynamics, conformers, conformer ensembles, point clouds,
reconstruction, shape learning, Morse theory.

Proteins in vivo vibrate at various frequencies: high frequencies
correspond to small amplitude deformations of chemical bonds, while
low frequencies characterize more global deformations. This
flexibility contributes to the entropy thus the `free energy` of
the system *protein - solvent*. From the experimental standpoint,
NMR studies generate ensembles of conformations, called `conformers`, and so do molecular dynamics (MD) simulations.
Of particular interest while investigating flexibility is the notion
of correlated motion. Intuitively, when a protein is folded, all
atomic movements must be correlated, a constraint which gets
alleviated when the protein unfolds since the steric constraints get
relaxed *diffusion - conformer
selection - induced fit* complex formation model.

Parameterizing these correlated motions, describing the corresponding energy landscapes, as well as handling collections of conformations pose challenging algorithmic problems.

At the side-chain level, the question of improving rotamer libraries is still of interest . This question is essentially a clustering problem in the parameter space describing the side-chains conformations.

At the atomic level, flexibility is essentially investigated resorting to methods based on a classical potential energy (molecular dynamics), and (inverse) kinematics. A molecular dynamics simulation provides a point cloud sampling the conformational landscape of the molecular system investigated, as each step in the simulation corresponds to one point in the parameter space describing the system (the conformational space) . The standard methodology to analyze such a point cloud consists of resorting to normal modes. Recently, though, more elaborate methods resorting to more local analysis , to Morse theory and to analysis of meta-stable states of time series have been proposed.

**Keywords:** Computational geometry, computational topology,
optimization, data analysis.

Making a stride towards a better understanding of the biophysical questions discussed in the previous sections requires various methodological developments, which we briefly discuss now.

In modeling interfaces and contacts, one may favor geometric or topological information.

On the geometric side, the problem of modeling contacts at the atomic
level is tantamount to encoding multi-body relations between an atom
and its neighbors. On the one hand, one may use an encoding of
neighborhoods based on geometric constructions such as Voronoi
diagrams (affine or curved) or arrangements of balls. On the other
hand, one may resort to clustering strategies in higher dimensional
spaces, as the

On the topological side, one may favor constructions which remain
stable if each atom in a structure *retains* the same neighbors,
even though the 3D positions of these neighbors change to some
extent. This process is observed in flexible docking cases, and call
for the development of methods to encode and compare shapes undergoing
tame geometric deformations.

In dealing with large assemblies, a number of methodological developments are called for.

On the experimental side, of particular interest is the disambiguation of proteomics signals. For example, TAP and mass spectrometry data call for the development of combinatorial algorithms aiming at unraveling pairwise contacts between proteins within an assembly. Likewise, density maps coming from electron microscopy, which are often of intermediate resolution (5-10Å) call the development of noise resilient segmentation and interpretation algorithms. The results produced by such algorithms can further be used to guide the docking of high resolutions crystal structures into maps.

As for modeling, two classes of developments are particularly stimulating. The first one is concerned with the design of algorithms performing reconstruction by data integration, a process reminiscent from non convex optimization. The second one encompasses assessment methods, in order to single out the reconstructions which best comply with the experimental data. For that endeavor, the development of geometric and topological models accommodating uncertainties is particularly important.

Given a sampling on an energy landscape, a number of fundamental issues actually arise: how does the point cloud describe the topography of the energy landscape (a question reminiscent from Morse theory)? Can one infer the effective number of degrees of freedom of the system over the simulation, and is this number varying? Answers to these questions would be of major interest to refine our understanding of folding and docking, with applications to the prediction of structural properties. It should be noted in passing that such questions are probably related to modeling phase transitions in statistical physics where geometric and topological methods are being used .

From an algorithmic standpoint, such questions are reminiscent of
*shape learning*. Given a collection of samples on an (unknown) *model*, *learning* consists of guessing the model from the samples
—the result of this process may be called the *reconstruction*. In doing so, two types of guarantees are sought:
topologically speaking, the reconstruction and the model should
(ideally!) be isotopic; geometrically speaking, their Hausdorff
distance should be small.
Motivated by applications in Computer Aided Geometric Design, surface
reconstruction triggered a major activity in the Computational
Geometry community over the past ten years
. Aside from applications, reconstruction
raises a number of deep issues:
the study of distance functions to the model and to the samples,
and their comparison; the study of Morse-like constructions stemming from distance
functions to points; the analysis of topological invariants of the model and the samples,
and their comparison.

As the name of the project-team suggest, *Algorithms-Biology-Structure* is primarily concerned with the
investigation of the structure-to-function relationship in structural
biology and biophysics.

This section briefly comments on all the software distributed by ABS . On the one hand, the software released in 2013 is briefly described as the context is presented in the sections dedicated to new results. On the other hand, the software made available before 2013 is briefly specified in terms of applications targeted.

In any case, the website advertising a given software also makes related publications available.

**Context.** Given the individual masses of the proteins
present in a complex, together with the mass of that complex, *stoichiometry determination* (SD) consists of computing how many
copies of each protein are needed to account for the overall mass of
the complex.
Our work on the stoichiometry determination (SD) problem for noisy
data in structural proteomics is described in
. The `addict` software suite not only
implements our algorithms `DP++` and `DIOPHANTINE` , but also
important algorithms to determine the so-called Frobenius number of a
vector of protein masses, and also to estimate the number of solutions
of a SD problem, from an unbounded knapsack problem.

**Distribution.** Binaries for the `addict` software suite
are made available from
http://

**Context.** Modeling protein binding patches, i.e. the
sets of atoms responsible of an interaction, is a central
problem to foster our understanding of the stability and of the
specificity of macro-molecular interactions. We developed a binding
patch model which encodes morphological properties, allows an
atomic-level comparison of binding patches at the geometric and
topological levels, and allows estimating binding affinities—with
state-of-the-art results on the protein complexes of the binding
affinity benchmark.
Given a binary protein complex, `vorpatch` identifies the binding
patches, and computes a topological encoding of each patch, defined as
an *atom shelling tree* generalizing the core-rim model. The
program `compatch` allows comparing two patches via the
comparison of their atom shelling trees, by favoring either a
geometric or a topological comparison.

**Distribution.**
Binaries for `VORPATCH` and `COMPATCH` are available from
http://

**Context.** Large protein assemblies such as the Nuclear
Pore Complex (NPC), chaperonin cavities, the proteasome or ATP
synthases, to name a few, are key to numerous biological functions.
Modeling such assemblies is especially challenging due to their
plasticity (the proteins involved may change along the cell cycle),
their size, and also the flexibility of the sub-units. To cope with
these difficulties, a reconstruction strategy known as Reconstruction
by Data Integration (RDI), aims at integrating diverse experimental
data. But the uncertainties on the input data yield equally uncertain
reconstructed models, calling for quantitative assessment strategies.

To leverage these reconstruction results, we introduced TOleranced Model (TOM) framework, which inherently accommodates uncertainties on the shape and position of proteins represented as density maps — maps from cryo electron-microscopy or maps stemming from reconstruction by data integration. In a TOM, a fuzzy molecule is sandwiched between two union of concentric balls, the size of the region between these two unions conveying information on the uncertainties.

The corresponding software package, `VORATOM` , includes programs to
(i) perform the segmentation of (probability) density maps, (ii)
construct toleranced models, (iii) explore toleranced models
(geometrically and topologically), (iv) compute Maximal Common Induced
Sub-graphs (MCIS) and Maximal Common Edge Sub-graphs (MCES) to assess
the pairwise contacts encoded in a TOM.

**Distribution.** Binaries for the software package `VORATOM`
are made available from
http://

In collaboration with S. Loriot (The Geometry Factory)

**Context.**
Modeling the interfaces of macro-molecular complexes is key to improve
our understanding of the stability and specificity of such
interactions. We proposed a simple parameter-free model for
macro-molecular interfaces, which enables a multi-scale investigation
—from the atomic scale to the whole interface scale.
Our interface model improves the state-of-the-art to (i) identify
interface atoms, (ii) define interface patches, (iii) assess the
interface curvature, (iv) investigate correlations between the
interface geometry and water dynamics / conservation patterns /
polarity of residues.

**Distribution.** The following website
http://

In collaboration with S. Loriot (The Geometry Factory, France)

**Context.** Molecular surfaces and volumes are paramount
to molecular modeling, with applications to electrostatic and energy
calculations, interface modeling, scoring and model evaluation, pocket
and cavity detection, etc. However, for molecular models represented
by collections of balls (Van der Waals and solvent accessible models),
such calculations are challenging in particular regarding
numerics. Because all available programs are overlooking numerical
issues, which in particular prevents them from qualifying the accuracy
of the results returned, we developed the first certified algorithm,
called `vorlume`. This program is based on so-called certified
predicates to guarantee the branching operations of the program, as
well as interval arithmetic to return an interval certified to contain
the exact value of each statistic of interest—in particular the
exact surface area and the exact volume of the molecular model
processed.

**Distribution.** Binaries for `Vorlume` is available
from http://

In collaboration with S. Loriot (The Geometry Factory, France) and J. Bernauer (Inria AMIB, France).

**Context.**
The ESBTL (Easy Structural Biology Template Library) is a lightweight
C++ library that allows the handling of PDB data and provides a data
structure suitable for geometric constructions and analyses, such as
those proposed by `INTERVOR`, `VORPATCH` and `COMPATCH`.

**Distribution.**
The C++ source code is available from
http://

Docking, scoring, interfaces, protein complexes, Voronoi diagrams, arrangements of balls.

The work undertaken in this vein in 2013 will be finalized in 2014.

Macro-molecular assembly, reconstruction by data integration, proteomics, modeling with uncertainties, curved Voronoi diagrams, topological persistence.

In collaboration with J. Araujo, and C. Caillouet, and D. Coudert, and S. Pérennes, from the COATI project-team (Inria-CNRS).

First, using a reduction of the set cover problem, we establish that
MCI is APX-hard.
Second, we show how to solve the problem to optimality using a mixed
integer linear programming formulation (`MILP` ).
Third, we develop a greedy algorithm based on union-find data
structures (`Greedy` ), yielding a `MILP` and `Greedy` are more parsimonious than
those reported by the algorithm initially developed in
biophysics, which are not qualified in terms of optimality. Since MILP
outputs a set of optimal solutions, we introduce the notion of
*consensus solution*. Using assemblies whose pairwise contacts
are known exhaustively, we show an almost perfect agreement between
the contacts predicted by our algorithms and the experimentally
determined ones, especially for consensus solutions.

Computational geometry, Computational topology, Voronoi diagrams,

In collaboration with S. Sachdeva (Princeton University, USA), and N. Shah (Carnegie Mellon University, USA).

Choosing balls to best approximate a 3D object is a non trivial
problem. To answer it, in , we first
address the *inner approximation* problem, which consists of
approximating an object *outer approximation*
enclosing the initial shape, and an *interpolated approximation*
sandwiched between the inner and outer approximations.

The inner approximation problem is reduced to a geometric
generalization of weighted max

Implementation-wise, we present robust software incorporating the calculation of the exact Delaunay triangulation of points with degree two algebraic coordinates, of the exact medial axis of a union of balls, and of a certified estimate of the volume of a union of balls. Application-wise, we exhibit accurate coarse-grain molecular models using a number of balls 20 times smaller than the number of atoms, a key requirement to simulate crowded cellular environments.

In collaboration with C. Robert (IBPC / CNRS, Paris, France), and C. Mueller (ETH, Zurich).

Morse theory provides a powerful framework to study the topology of a manifold from a function defined on it, but discrete constructions have remained elusive due to the difficulty of translating smooth concepts to the discrete setting.

Consider the problem of approximating the Morse-Smale (MS) complex of a Morse function from a point cloud and an associated nearest neighbor graph (NNG). While following the constructive proof of the Morse homology theorem, we present novel concepts for critical points of any index, and the associated Morse-Smale diagram .

Our framework has three key advantages. First, it requires elementary data structures and operations, and is thus suitable for high-dimensional data processing. Second, it is gradient free, which makes it suitable to investigate functions whose gradient is unknown or expensive to compute. Third, in case of under-sampling and even if the exact (unknown) MS diagram is not found, the output conveys information in terms of ambiguous flow, and the Morse theoretical version of topological persistence, which consists in canceling critical points by flow reversal, applies.

On the experimental side, we present a comprehensive analysis of a large panel of bi-variate and tri-variate Morse functions whose Morse-Smale diagrams are known perfectly, and show that these diagrams are recovered perfectly.

In a broader perspective, we see our framework as a first step to study complex dynamical systems from mere samplings consisting of point clouds.

Computational Biology, Biomedicine.

Edited in collaboration with P. Kornprobst, from the Neuromathcomp project-team.

Biology and biomedicine currently undergo spectacular progresses due to a synergy between technological advances and inputs from physics, chemistry, mathematics, statistics and computer science. The goal of the book is to evidence this synergy, by describing selected developments in the following fields: bioinformatics, biomedicine, neuroscience.

This book is unique in two respects. First, by the variety and scales of systems studied. Second, by its presentation, as each chapter presents the biological or medical context, follows up with mathematical or algorithmic developments triggered by a specific problem, and concludes with one or two success stories, namely new insights gained thanks to these methodological developments. It also highlights some unsolved and outstanding theoretical questions, with potentially high impact on these disciplines.

Two communities will be particularly interested. The first one is the vast community of applied mathematicians and computer scientists, whose interests should be captured by the added value generated by the application of advanced concepts and algorithms to challenging biological or medical problems. The second is the equally vast community of biologists. Whether scientists or engineers, they will find in this book a clear and self-contained account of concepts and techniques from mathematics and computer science, together with success stories on their favorite systems. The variety of systems described will act as an eye opener on a panoply of complementary conceptual tools. Practically, the resources listed at the end of each chapter (databases, software) will prove invaluable to get started on a specific topic.

Title: Modeling Large Protein Assemblies with Toleranced Models

Type: Projet Exploratoire Pluri-disciplinaire (PEPS) CNRS / Inria / INSERM

Duration: two years

Coordinator: F. Cazals (Inria, ABS)

Others partners: V.Doye (Inst. Jacques Monod)

Abstract: Reconstruction by Data Integration (RDI) is an emerging paradigm to reconstruct large protein assemblies, as discussed in section .

Elaborating on our Toleranced Models framework, a geometric framework
aiming at inherently accommodating uncertainties on the shapes and
positions of proteins within large assemblies, we ambition within the
scope of the two year long PEPS project entitled *Modeling Large
Protein Assemblies with Toleranced Models* to
(i) design TOM compatible with the flexibility of proteins, (ii)
develop graph-based analysis of TOM, and (iii) perform experimental
validations on the NPC.

Title: Computational Geometric Learning (CGL)

Type: COOPERATION (ICT)

Defi: FET Open

Instrument: Specific Targeted Research Project (STREP)

Duration: November 2010 - October 2013

Coordinator: Friedrich-Schiller-Universität Jena (Germany)

Others partners: Jena Univ. (coord.), Inria (Geometrica Sophia, Geometrica Saclay, ABS), Tech. Univ. of Dortmund, Tel Aviv Univ., Nat. Univ. of Athens, Univ. of Groningen, ETH Zürich, Freie Univ. Berlin.

See also: http://

Abstract:
*The Computational Geometric Learning project aims at extending the
success story of geometric algorithms with guarantees to
high-dimensions. This is not a straightforward task. For many
problems, no efficient algorithms exist that compute the exact
solution in high dimensions. This behavior is commonly called the
curse of dimensionality. We try to address the curse of dimensionality
by focusing on inherent structure in the data like sparsity or low
intrinsic dimension, and by resorting to fast approximation
algorithms.*

ABS has regular international collaboration, in particular with the
members of the FP7 project *Computational geometric learning*
mentioned in section .

Angeliki Kalamara, from the University of Athens, performed a 5
month internship under the dual supervision of F. Cazals and
I. Emiris (Univ. of Athens). The topic was *Modeling
cryo-electron microscopy density maps*.

**(Winter school Algorithms in Structural Bio-informatics)**
Together with J. Cortès from LAAS / CNRS (Toulouse) and C. Robert
(IBPC/CNRS), F. Cazals organized the winter school *Algorithms
in Structural
Bio-informatics*

(**Statistical Learning Theory: a Short Course)** F. Cazals
organized a mini-course by P. Grunwald, from the CWI. The details
can be found at
https://

– F. Cazals was member of the following PC:

Symposium on Geometry Processing.

Computer Graphics International

ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics

International Conference on Pattern Recognition in Bioinformatics

Computational Intelligence Methods for Bioinformatics and Biostatistics

– F. Cazals was appointed as Panel expert for the 7th Framework Programme 7 from the EU, Information and Communication Technologies /

Master: F. Cazals, *Geometric and topological modeling with
applications in biophysics*, 24h, Ecole Centrale Paris, France, 3rd
year of the engineering curriculum in applied mathematics.

Master: F. Cazals, *Algorithmic problems in computational
structural biology*, 24h, Master of Science in Computational Biology
from the University of Nice Sophia Antipolis,
France. (http://

Graduate level: F. Cazals and C. Robert,
*Analyzing conformational landscapes, with applications to the
design of collective coordinates*, 6h,
Winter school *Algorithms in Structural Bio-informatics*.

**(PhD thesis, ongoing)** C. Roth, *Modeling the
flexibility of macro-molecules: theory and applications*, University
of Nice Sophia Antipolis. Advisor: F. Cazals.

**(PhD thesis, ongoing)** A. Lheritier, *Scoring and
discriminating in high-dimensional spaces: a geometric based
approach of statistical tests*, University of Nice Sophia Antipolis.
Advisor: F. Cazals.

**(PhD thesis, ongoing)** D. Agarwal, *Towards
nano-molecular design: advanced algorithms for modeling large
protein assemblies*, University of Nice Sophia Antipolis.
Advisor: F. Cazals.

**(PhD thesis, ongoing)** S. Marillet, *Modeling antibody
- antigen complexes*, Univ. of Nice Sophia Antipolis. The thesis is
co-advised by F. Cazals and P. Boudinot (INRA Jouy-en-Josas).