**Computational Biology and Computational Structural Biology.**
Understanding the lineage between species and the genetic drift of
genes and genomes, apprehending the control and feed-back loops
governing the behavior of a cell, a tissue, an organ or a body, and
inferring the relationship between the structure of biological
(macro)-molecules and their functions are amongst the major challenges
of modern biology. The investigation of these challenges is
supported by three types of data: genomic data, transcription and
expression data, and structural data.

Genetic data feature sequences of nucleotides on DNA and RNA
molecules, and are symbolic data whose processing falls in the realm
of Theoretical Computer Science: dynamic programming, algorithms on
texts and strings, graph theory dedicated to phylogenetic problems.
Transcription and expression data feature evolving concentrations of
molecules (RNAs, proteins, metabolites) over time, and fit in the formalism of
discrete and continuous dynamical systems, and of graph theory. The
exploration and the modeling of these data are covered by a rapidly
expanding research field termed *systems biology*.
Structural data encode informations about the 3D structures of
molecules (nucleic acids (DNA, RNA), proteins, small molecules) and their
interactions, and come from three main sources: X ray
crystallography, NMR spectroscopy, cryo Electron Microscopy.
Ultimately, structural data should expand our understanding of how the
structure accounts for the function of macro-molecules —one of the
central questions in structural biology. This goal actually subsumes
two equally difficult challenges, which are *folding* —the
process through which a protein adopts its 3D structure, and *docking* —the process through which two or several molecules
assemble. Folding and docking are driven by non covalent interactions,
and for complex systems, are actually inter-twined
.
Apart from the bio-physical interests raised by these processes, two
different application domains are concerned: in fundamental biology,
one is primarily interested in understanding the machinery of the
cell; in medicine, applications to drug design are developed.

**Modeling in Computational Structural Biology.**
Acquiring structural data is not always possible: NMR is restricted to
relatively small molecules; membrane proteins do not crystallize, etc.
As a matter of fact, while the order of magnitude of the number of
genomes sequenced is one thousand, the Protein Data Bank contains
circa 75,000 structures. Because one gene may yield a number of
proteins through splicing, it is difficult to estimate the number of
proteins from the number of genes. However, the latter is several
orders of magnitudes beyond the former.
For these reasons, *molecular modeling* is expected to play a key
role in investigating structural issues.

Ideally, bio-physical models of macro-molecules should resort to quantum mechanics. While this is possible for small systems, say up to 50 atoms, large systems are investigated within the framework of the Born-Oppenheimer approximation which stipulates the nuclei and the electron cloud can be decoupled. Example force fields developed in this realm are AMBER, CHARMM, OPLS. Of particular importance are Van der Waals models, where each atom is modeled by a sphere whose radius depends on the atom chemical type. From an historical perspective, Richards , and later Connolly , while defining molecular surfaces and developing algorithms to compute them, established the connexions between molecular modeling and geometric constructions. Remarkably, a number of difficult problems (e.g. additively weighted Voronoi diagrams) were touched upon in these early days.

The models developed in this vein are instrumental in investigating
the interactions of molecules for which no structural data is
available. But such models often fall short from providing complete
answers, which we illustrate with the folding problem. On one hand,
as the conformations of side-chains belong to discrete sets (the
so-called rotamers or rotational isomers)
, the number of distinct conformations of a
poly-peptidic chain is exponential in the number of amino-acids. On
the other hand, Nature folds proteins within time scales ranging from
milliseconds to hours, which is out of reach for simulations. The fact
that Nature avoids the exponential trap is known as Levinthal's
paradox.
The intrinsic difficulty of problems calls for models exploiting
several classes of informations. For small systems, *ab initio*
models can be built from first principles. But for more complex
systems, *homology* or template-based models integrating a variable amount of
knowledge acquired on similar systems are resorted to.

The variety of approaches developed are illustrated by the two
community wide experiments CASP (*Critical Assessment of Techniques
for Protein Structure Prediction*; http://*Critical Assessment of Prediction of Interactions*;
http://

As illustrated by the previous discussion, modeling macro-molecules touches upon biology, physics and chemistry, as well as mathematics and computer science. In the following, we present the topics investigated within ABS.

Three key achievements were obtained in 2012.

The first one deals with the problem of modeling high resolution protein complexes, a topic for which we came up with an original binding patch model . Our model not only provides more accurate descriptors of key quantities (the binding affinity in particular), but also sheds new light on the flexibility of proteins upon docking. These developments will in particular be used to investigate complexes from the immune system in the future.

The second one deals with the problem of modeling large protein assemblies, involving up to hundreds of polypeptide chains. We finalized the application of our Toleranced Models framework to the nuclear pore complex , , and started to produce novel algorithms for mass-spectrometry data , an emerging technique to infer structural information on large molecular machines.

Finally, we have also made a steady progress on algorithmic foundations, in particular on the problem of developing a Morse theory for point cloud data, in the perspective of analyzing molecular dynamics data. Tests are currently on the way, so that this work will be advertised in 2013.

The research conducted by ABS focuses on two main directions in Computational Structural Biology (CSB), each such direction calling for specific algorithmic developments. These directions are:

- Modeling interfaces and contacts,

- Modeling the flexibility of macro-molecules.

Docking, interfaces, protein complexes, structural alphabets, scoring functions, Voronoi diagrams, arrangements of balls

**Problems addressed.**
The Protein Data Bank, http://*interacting* with atoms of the second
one. Understanding the structure of interfaces is central to
understand biological complexes and thus the function of biological
molecules . Yet, in spite of almost three decades
of investigations, the basic principles guiding the formation of
interfaces and accounting for its stability are unknown
. Current investigations follow two routes.
From the experimental perspective , directed
mutagenesis enables one to quantify the energetic importance of
residues, important residues being termed *hot* residues. Such
studies recently evidenced the *modular* architecture of
interfaces
.
From the modeling perspective, the main issue consists of guessing the
hot residues from sequence and/or structural informations
.

The description of interfaces is also of special interest to improve
*scoring functions*. By scoring function, two things are meant:
either a function which assigns to a complex a quantity homogeneous to
a free energy change

**Methodological developments.**
Describing interfaces poses problems in two settings:
static and dynamic.

In the static setting, one seeks the minimalist geometric model
providing a relevant bio-physical signal. A first step in doing so
consists of identifying interface atoms, so as to relate the geometry and
the bio-chemistry at the interface level .
To elaborate at the atomic level, one seeks a structural alphabet
encoding the spatial structure of proteins. At the side-chain and
backbone level, an example of such alphabet is that of
. At the atomic level and in spite of recent
observations on the local structure of the neighborhood of a given
atom , no such alphabet is known. Specific
important local conformations are known, though. One of them is the
so-called dehydron structure, which is an under-desolvated hydrogen
bond —a property that can be directly inferred from the spatial
configuration of the

A structural alphabet at the atomic level may be seen as an alphabet
featuring for an atom of a given type all the conformations this atom
may engage into, depending on its neighbors. One way to tackle this
problem consists of extending the notions of molecular surfaces used
so far, so as to encode multi-body relations between an atom and its
neighbors . In order to derive such alphabets,
the following two strategies are obvious. On one hand, one may use an
encoding of neighborhoods based on geometric constructions such as
Voronoi diagrams (affine or curved) or arrangements of balls. On the
other hand, one may resort to clustering strategies in higher
dimensional spaces, as the

In the dynamic setting, one wishes to understand whether selected (hot) residues exhibit specific dynamic properties, so as to serve as anchors in a binding process . More generally, any significant observation raised in the static setting deserves investigations in the dynamic setting, so as to assess its stability. Such questions are also related to the problem of correlated motions, which we discuss next.

Macro-molecular assembly, reconstruction by data integration, proteomics, modeling with uncertainties, curved Voronoi diagrams, topological persistence.

Large protein assemblies such as the Nuclear Pore Complex (NPC),
chaperonin cavities, the proteasome or ATP synthases, to name a few,
are key to numerous biological functions. To improve our
understanding of these functions, one would ideally like to build and
animate atomic models of these molecular machines. However, this task
is especially tough, due to their size and their plasticity, but also
due to the flexibility of the proteins involved.
In a sense, the modeling challenges arising in this context are
different from those faced for binary docking, and also from those
encountered for intermediate size complexes which are often amenable
to a processing mixing (cryo-EM) image analysis and classical docking.
To face these new challenges, an emerging paradigm is that of
reconstruction by data integration . In a
nutshell, the strategy is reminiscent from NMR and consists of mixing
experimental data from a variety of sources, so as to find out the
model(s) best complying with the data.
This strategy has been in particular used to propose plausible models
of the Nuclear Pore Complex , the largest assembly
known to date in the eukaryotic cell, and consisting of 456 protein
*instances* of 30 *types*.

Reconstruction by data integration requires three ingredients. First,
a parametrized model must be adopted, typically a collection of balls
to model a protein with pseudo-atoms. Second, as in NMR, a functional
measuring the agreement between a model and the data must be
chosen. In , this functional is based upon *restraints*, namely penalties associated to the experimental data.
Third, an optimization scheme must be selected.
The design of restraints is notoriously challenging, due to the
ambiguous nature and/or the noise level of the data.
For example, Tandem Affinity Purification (TAP) gives access to a *pullout* i.e. a list of protein types which are known to interact
with one tagged protein type, but no information on the number of
complexes or on the stoichiometry of proteins types within a complex
is provided.
In cryo-EM, the envelope enclosing an assembly is often imprecisely
defined, in particular in regions of low density. For immuno-EM
labelling experiments, positional uncertainties arise from the
microscope resolution.

These uncertainties coupled with the complexity of the functional
being optimized, which in general is non convex, have two
consequences.
First, it is impossible to single out a unique reconstruction, and a
set of plausible reconstructions must be considered. As an example,
1000 plausible models of the NPC were reported in
. Interestingly, averaging the positions of all
balls of a particular protein type across these models resulted in 30
so-called *probability density maps*, each such map encoding the
probability of presence of a particular protein type at a particular
location in the NPC.
Second, the assessment of all models (individual and averaged) is non
trivial. In particular, the lack of straightforward statistical
analysis of the individual models and the absence of assessment for
the averaged models are detrimental to the mechanistic exploitation of
the reconstruction results. At this stage, such models therefore
remain qualitative.

As outlined by the previous discussion, a number of methodological developments are called for. On the experimental side, the problem of fostering the interpretation of data is under scrutiny. Of particular interest is the disambiguation of proteomics signals (TAP data, mass spectrometry data), and that of density maps coming from electron microscopy. As for modeling, two classes of developments are particularly stimulating. The first one is concerned with the design of algorithms performing reconstruction by data integration. The second one encompasses assessment tools, in order to single out the reconstructions which best comply with the experimental data.

Folding, docking, energy landscapes, induced fit, molecular dynamics, conformers, conformer ensembles, point clouds, reconstruction, shape learning, Morse theory

**Problems addressed.**
Proteins in vivo vibrate at various frequencies: high frequencies
correspond to small amplitude deformations of chemical bonds, while
low frequencies characterize more global deformations. This
flexibility contributes to the entropy thus the `free energy` of
the system *protein - solvent*. From the experimental standpoint,
NMR studies and Molecular Dynamics simulations generate ensembles of
conformations, called `conformers`.
Of particular interest while investigating flexibility is the notion
of correlated motion. Intuitively, when a protein is folded, all
atomic movements must be correlated, a constraint which gets
alleviated when the protein unfolds since the steric constraints get
relaxed *diffusion - conformer
selection - induced fit* complex formation model.

Parameterizing these correlated motions, describing the corresponding energy landscapes, as well as handling collections of conformations pose challenging algorithmic problems.

**Methodological developments.**
At the side-chain level, the question of improving rotamer libraries
is still of interest . This question is
essentially a clustering problem in the parameter space describing the
side-chains conformations.

At the atomic level, flexibility is essentially investigated resorting to methods based on a classical potential energy (molecular dynamics), and (inverse) kinematics. A molecular dynamics simulation provides a point cloud sampling the conformational landscape of the molecular system investigated, as each step in the simulation corresponds to one point in the parameter space describing the system (the conformational space) . The standard methodology to analyze such a point cloud consists of resorting to normal modes. Recently, though, more elaborate methods resorting to more local analysis , to Morse theory and to analysis of meta-stable states of time series have been proposed.

Given a sampling on an energy landscape, a number of fundamental issues actually arise: how does the point cloud describe the topography of the energy landscape (a question reminiscent from Morse theory)? Can one infer the effective number of degrees of freedom of the system over the simulation, and is this number varying? Answers to these questions would be of major interest to refine our understanding of folding and docking, with applications to the prediction of structural properties. It should be noted in passing that such questions are probably related to modeling phase transitions in statistical physics where geometric and topological methods are being used .

From an algorithmic standpoint, such questions are reminiscent of
*shape learning*. Given a collection of samples on an (unknown) *model*, *learning* consists of guessing the model from the samples
—the result of this process may be called the *reconstruction*. In doing so, two types of guarantees are sought:
topologically speaking, the reconstruction and the model should
(ideally!) be isotopic; geometrically speaking, their Hausdorff
distance should be small.
Motivated by applications in Computer Aided Geometric Design, surface
reconstruction triggered a major activity in the Computational
Geometry community over the past ten years
. Aside from applications, reconstruction
raises a number of deep issues:
the study of distance functions to the model and to the samples,
and their comparison ;
the study of Morse-like constructions stemming from distance
functions to points ;
the analysis of topological invariants of the model and the samples,
and their comparison , .

Last but not least, gaining insight on such questions would also help
to effectively select a reduced set of conformations best representing
a larger number of conformations. This selection problem is indeed
faced by flexible docking algorithms that need to maintain and/or
update collections of conformers for the second stage of the *diffusion - conformer selection - induced fit* complex formation
model.

This section briefly comments on all the software distributed by ABS. On the one hand, the software released in 2012 is briefly described as the context is presented in the sections dedicated to new results. On the other hand, the software made available before 2012 is briefly specified in terms of applications targeted.

In any case, the website advertising a given software also makes related publications available.

**Context.** Our work on the stoichiometry determination
(SD) problem for noisy data in structural proteomics is described in
section . The `addict` software suite not only
implements our algorithms `DP++` and `DIOPHANTINE`, but also
important algorithms to determine the so-called Frobenius number of a
vector of protein masses, and also to estimate the number of solutions
of a SD problem, from an unbounded knapsack problem.

**Distribution.** Binaries for the `addict` software suite
are made available from
http://

**Context.** Modeling protein binding patches is a central
problem to foster our understanding of the stability and of the
specificity of macro-molecular interactions. We developed a binding
patch model which encodes morphological properties, allows an
atomic-level comparison of binding patches at the geometric and
topological levels, and allows estimating binding affinities—with
state-of-the-art results on the protein complexes of the binding
affinity benchmark. Given a protein complex, `vorpatch` compute
the binding patches, while the program `compatch` allows
comparing two patches.

**Distribution.**
Binaries for `VORPATCH` and `COMPATCH` are available from
http://

**Context.** Large protein assemblies such as the Nuclear
Pore Complex (NPC), chaperonin cavities, the proteasome or ATP
synthases, to name a few, are key to numerous biological functions.
Modeling such assemblies is especially challenging due to their
plasticity (the proteins involved may change along the cell cycle),
their size, and also the flexibility of the sub-units. To cope with
these difficulties, a reconstruction strategy known as Reconstruction
by Data Integration (RDI), aims at integrating diverse experimental
data. But the uncertainties on the input data yield equally uncertain
reconstructed models, calling for quantitative assessment strategies.

To leverage thee reconstruction results, we introduced TOleranced
Model (TOM) framework, which inherently accommodates uncertainties on the
shape and position of proteins.
The corresponding software package, `VORATOM`, includes programs to
(i) perform the segmentation of (probability) density maps, (ii)
construct toleranced models, (iii) explore toleranced models
(geometrically and topologically), (iv) compute Maximal Common Induced
Sub-graphs (MCIS) and Maximal Common Edge Sub-graphs (MCES) to assess
the pairwise contacts encoded in a TOM.

**Distribution.** Binaries for the software package `VORATOM`
are made available from
http://

**Context.**
Given a snapshot of a molecular dynamics simulation, a classical
problem consists of *quenching* that structure—minimizing the
potential energy of the solute together with selected layers of
solvent molecules. The program `wsheller` provides a solution to
the water layer selection, and incorporates a topological control of
the layers selected.

**Distribution.**
Binaries for `wsheller` are available from
http://

In collaboration with S. Loriot (The Geometry Factory)

**Context.**
Modeling the interfaces of macro-molecular complexes is key to improve
our understanding of the stability and specificity of such
interactions. We proposed a simple parameter-free model for
macro-molecular interfaces, which enables a multi-scale investigation
—from the atomic scale to the whole interface scale.
Our interface model improves the state-of-the-art to (i) identify
interface atoms, (ii) define interface patches, (iii) assess the
interface curvature, (iv) investigate correlations between the
interface geometry and water dynamics / conservation patterns /
polarity of residues.

**Distribution.** The following website
http://

In collaboration with S. Loriot (The Geometry Factory, France)

**Context.** Molecular surfaces and volumes are paramount
to molecular modeling, with applications to electrostatic and energy
calculations, interface modeling, scoring and model evaluation, pocket
and cavity detection, etc. However, for molecular models represented
by collections of balls (Van der Waals and solvent accessible models),
such calculations are challenging in particular regarding
numerics. Because all available programs are overlooking numerical
issues, which in particular prevents them from qualifying the accuracy
of the results returned, we developed the first certified algorithm,
called `vorlume`. This program is based on so-called certified
predicates to guarantee the branching operations of the program, as
well as interval arithmetic to return an interval certified to contain
the exact value of each statistic of interest—in particular the
exact surface area and the exact volume of the molecular model
processed.

**Distribution.** Binaries for `Vorlume` is available
from http://

In collaboration with S. Loriot (The Geometry Factory, France) and J. Bernauer (Inria AMIB, France).

**Context.**
The ESBTL (Easy Structural Biology Template Library) is a lightweight
C++ library that allows the handling of PDB data and provides a data
structure suitable for geometric constructions and analyses.

**Distribution.**
The C++ source code is available from
http://

In collaboration with N. Yanev (University of Sofia, and IMI at Bulgarian Academy of Sciences, Bulgaria), and R. Andonov (Inria Rennes - Bretagne Atlantique, and IRISA/University of Rennes 1, France).

**Context.**
Structural similarity between proteins provides significant insights
about their functions. Maximum Contact Map Overlap maximization (CMO)
received sustained attention during the past decade and can be
considered today as a credible protein structure measure.
The solver `A_purva` is an exact CMO solver that is both
efficient (notably faster than the previous exact algorithms), and
reliable (providing accurate upper and lower bounds of the
solution). These properties make it applicable for large-scale protein
comparison and classification.

**Distribution.**
The software is available from
http://

Docking, scoring, interfaces, protein complexes, scoring functions, Voronoi diagrams, arrangements of balls.

In collaboration with C. Robert (IBPC / CNRS, Paris, France).

While proteins and nucleic acids are the fundamental components of an organism, Biology itself is based on the interactions they make with each other. Analyzing macromolecular interactions typically requires handling systems involving from two to hundreds of polypeptide chains. After a brief overview of the modeling challenges faced in computational structural biology, the text reviews concepts and tools aiming at improving our understanding of the link between the static structures of macromolecular complexes and their biophysical/biological properties. We discuss geometrical approaches suited to atomic-resolution complexes and to large protein assemblies; for each, we also present examples of their successful application in quantifying and interpreting biological data. This methodology includes state-of-the-art geometric analyses of surface area, volume, curvature, and topological properties (isolated components, cavities, voids, cycles) related to Voronoi constructions in the context of structure analysis. On the applied side, we present novel insights into real biological problems gained thanks to these modeling tools.

In collaboration with I. Wohlers (CWI / VU University Amsterdam, Netherlands), R. Andonov (Irisa / Rennes University, France), G.W. Klau (CWI / VU University Amsterdam, Netherlands).

Protein structural alignment is a key method for answering many
biological questions involving the transfer of information from
well-studied proteins to less well-known proteins. Since structures
are more conserved during evolution than sequences, structural
alignment allows for the most precise mapping of equivalent
residues. Many structure-based scoring schemes have been proposed and
there is no consensus on which scoring is the best. Comparative
studies also show that alignments produced by different methods can
differ considerably. Based on the alignment engine derived from
A_purva, we designed CSA (Comparative Structural Alignment), the
first web server for computation, evaluation and comprehensive
comparison of pairwise protein structure alignments at single residue
level . It offers the exact computation of
alignments using the scoring schemes of DALI, Contact Map Overlap
(CMO), MATRAS and PAUL. In CSA, computed or uploaded alignments can be
explored in terms of many inter-residue distances, RMSD, and
sequence-based scores. Intuitive visualizations also help in grasping
the agreements and differences between alignments. The user can thus
make educated decisions about the structural similarity of two
proteins and, if necessary, post-process alignments by hand. CSA is
available at http://

Macro-molecular assembly, reconstruction by data integration, proteomics, modeling with uncertainties, curved Voronoi diagrams, topological persistence.

In structural proteomics, given the individual masses of a set of
protein types and the exact mass of a protein complex, the *exact
stoichiometry determination problem (SD)*, also known as the
money-change problem, consists of enumerating all the stoichiometries
of these types which allow to recover the target mass.
If the target mass suffers from experimental uncertainties, the *interval SD problem* consists of finding all the stoichiometry vectors
compatible with a target mass within an interval.

The programs accompanying this paper are available from
http://

Voronoi diagrams,

The work undertaken in this vein in 2012 will be finalized in 2013.

Immune response, infection, antibodies, complementarity determining region (CDR)

In collaboration with

R. Castro, L. Journeau, A. Benmansour and P. Boudinot (INRA Jouy-en-Josas, France)

H.P. Pham and A. Six (Univ. of Paris VI, France)

O. Bouchez (INRA Castanet Tolosan, France)

V. Giudicelli and M-P. Lefranc (IMGT / CNRS, Montpellier, France)

E. Quillet (INRA Jouy-en-Josas, France)

S. Fillatreau (Leibniz Institute, Berlin, Germany)

O. Sunyer (Univ. of Pennsylvania, USA)

Upon infection, B-lymphocytes expressing antibodies specific for the
intruding pathogen develop clonal responses triggered by pathogen
recognition via the B-cell receptor. The constant region of antibodies
produced by such developing clones dictates their functional
properties. In teleost fish, the clonal structure of B-cell responses
and the respective contribution of the three isotypes IgM, IgD, and
IdT remains unknown. The expression of IgM and IgT are mutually
exclusive, leading to the existence of two B-cell subsets expressing
either both IgM and IgD or only IgT. In , we
undertook a comprehensive analysis of the variable heavy chain (VH)
domain repertoires of the IgM, IgD and IgT in spleen of homozygous
isogenic rainbow trout (Onchorhynus mykiss), before and after
challenge with a rhabdovirus, the Viral Hemorrhagic Septicemia Virus
(VHSV), using CDR3-length spectratyping and pyrosequencing of
immunoglobulin (Ig) transcripts.
In healthy fish, we observed distinct repertories for IgM, IgD and IgT
respectively, with a few amplified

Reconstruction by Data Integration (RDI) is an emerging paradigm to reconstruct large protein assemblies, as discussed in section .

Elaborating on our Toleranced Models framework, a geometric framework
aiming at inherently accommodating uncertainties on the shapes and
positions of proteins within large assemblies, we ambition within the
scope of the two year long PEPS project entitled *Modeling Large
Protein Assemblies with Toleranced Models* to
(i) design TOM compatible with the flexibility of proteins, (ii)
develop graph-based analysis of TOM, and (iii) perform experimental
validations on the NPC.

Title: Computational Geometric Learning (CGL)

Type: COOPERATION (ICT)

Defi: FET Open

Instrument: Specific Targeted Research Project (STREP)

Duration: November 2010 - October 2013

Coordinator: Friedrich-Schiller-Universität Jena (Germany)

Others partners: Jena Univ. (coord.), Inria (Geometrica Sophia, Geometrica Saclay, ABS), Tech. Univ. of Dortmund, Tel Aviv Univ., Nat. Univ. of Athens, Univ. of Groningen, ETH Zürich, Freie Univ. Berlin.

See also: http://

Abstract:
*The Computational Geometric Learning project aims at extending the
success story of geometric algorithms with guarantees to
high-dimensions. This is not a straightforward task. For many
problems, no efficient algorithms exist that compute the exact
solution in high dimensions. This behavior is commonly called the
curse of dimensionality. We try to address the curse of dimensionality
by focusing on inherent structure in the data like sparsity or low
intrinsic dimension, and by resorting to fast approximation
algorithms.*

From May to July 2012, summer internship from Pratik Kumar (Indian Institute of Technology of Bombay). Topic: Modeling density maps in cryo electron microscopy.

– F. Cazals was member of the following PC:

Symposium on Geometry Processing.

Geometric Modeling and processing.

ACM Symposium on Solid and Physical Modeling.

IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

International conference on Pattern Recognition in Bioinformatics.

– F. Cazals is member of the scientific committee of *GDR Bio-informatique-Moléculaire*, in charge of activities related
to computational structural biology.

Having initiated and coordinated the Master of Science in
Computational Biology, see http://*Modeling in Computational
Biology and Medicine: A Multidisciplinary Endeavor*
, with one chapter per class taught in this
program.

**(Master)** Ecole Centrale Paris, France, 3rd year of the
engineering curriculum in applied mathematics. Course on *Geometric and topological modeling with applications in
biophysics*, taught by F. Cazals (24h).

**(Master)** University of Nice Sophia Antipolis, France,
Master of Science in Computational Biology
(http://*Algorithmic problems in
computational structural biology*, taught by F. Cazals (24h).

**(Winter school Algorithms in Structural Bio-informatics)**
Together with J. Cortès from LAAS / CNRS (Toulouse), F. Cazals
is organizing the winter school *Algorithms in Structural
Bio-informatics*

PhD & HdR:

**(PhD thesis, ongoing)** C. Roth, *Modeling the
flexibility of macro-molecules: theory and applications*, University
of Nice Sophia Antipolis. Advisor: F. Cazals.

**(PhD thesis, ongoing)** A. Lheritier, *Scoring and
discriminating in high-dimensional spaces: a geometric based
approach of statistical tests*, University of Nice Sophia Antipolis.
Advisor: F. Cazals.

**(PhD thesis, ongoing)** D. Agarwal, *Towards
nano-molecular design: advanced algorithms for modeling large
protein assemblies*, University of Nice Sophia Antipolis.
Advisor: F. Cazals.