**Computational Biology and Computational Structural Biology.**
Understanding the lineage between species and the genetic drift of
genes and genomes, apprehending the control and feed-back loops
governing the behavior of a cell, a tissue, an organ or a body, and
inferring the relationship between the structure of biological
(macro)-molecules and their functions are amongst the major challenges
of modern biology. The investigation of these challenges is
supported by three types of data: genomic data, transcription and
expression data, and structural data.

Genetic data feature sequences of nucleotides on DNA and RNA
molecules, and are symbolic data whose processing falls in the realm
of Theoretical Computer Science: dynamic programming, algorithms on
texts and strings, graph theory dedicated to phylogenetic problems.
Transcription and expression data feature evolving concentrations of
molecules (RNAs, proteins, metabolites) over time, and fit in the formalism of
discrete and continuous dynamical systems, and of graph theory. The
exploration and the modeling of these data are covered by a rapidly
expanding research field termed *systems biology*.
Structural data encode informations about the 3D structures of
molecules (nucleic acids (DNA, RNA), proteins, small molecules) and their
interactions, and come from three main sources: X ray
crystallography, NMR spectroscopy, cryo Electron Microscopy.
Ultimately, structural data should expand our understanding of how the
structure accounts for the function of macro-molecules – one of the
central questions in structural biology. This goal actually subsumes
two equally difficult challenges, which are *folding* – the
process through which a protein adopts its 3D structure, and *docking* – the process through which two or several molecules
assemble. Folding and docking are driven by non covalent interactions,
and for complex systems, are actually inter-twined
.
Apart from the bio-physical interests raised by these processes, two
different application domains are concerned: in fundamental biology,
one is primarily interested in understanding the machinery of the
cell; in medicine, applications to drug design are developed.

**Modeling in Computational Structural Biology.**
Acquiring structural data is not always possible: NMR is restricted to
relatively small molecules; membrane proteins do not crystallize, etc.
As a matter of fact, the order of magnitude of the number of
genomes sequenced is of the order of one thousand, which results in
circa one million of genes recorded in the manually curated
Swiss-Prot database.
On the other hand, the Protein Data Bank contains circa 90,000
structures. Thus, the paucity of structures with respect to the known
number of genes calls for modeling in structural biology, so as to
foster our understanding of the structure-to-function relationship.

Ideally, bio-physical models of macro-molecules should resort to quantum mechanics. While this is possible for small systems, say up to 50 atoms, large systems are investigated within the framework of the Born-Oppenheimer approximation which stipulates the nuclei and the electron cloud can be decoupled. Example force fields developed in this realm are AMBER, CHARMM, OPLS. Of particular importance are Van der Waals models, where each atom is modeled by a sphere whose radius depends on the atom chemical type. From an historical perspective, Richards , and later Connolly , while defining molecular surfaces and developing algorithms to compute them, established the connexions between molecular modeling and geometric constructions. Remarkably, a number of difficult problems (e.g. additively weighted Voronoi diagrams) were touched upon in these early days.

The models developed in this vein are instrumental in investigating
the interactions of molecules for which no structural data is
available. But such models often fall short from providing complete
answers, which we illustrate with the folding problem. On one hand, as
the conformations of side-chains belong to discrete sets (the
so-called rotamers or rotational isomers) ,
the number of distinct conformations of a poly-peptidic chain is
exponential in the number of amino-acids. On the other hand, Nature
folds proteins within time scales ranging from milliseconds to hours,
while time-steps used in molecular dynamics simulations are of the
order of the femto-second, so that biologically relevant time-scales
are out reach for simulations. The fact that Nature avoids the
exponential trap is known as Levinthal's paradox.
The intrinsic difficulty of problems calls for models exploiting
several classes of informations. For small systems, *ab initio*
models can be built from first principles. But for more complex
systems, *homology* or template-based models integrating a variable amount of
knowledge acquired on similar systems are resorted to.

The variety of approaches developed are illustrated by the two
community wide experiments CASP (*Critical Assessment of Techniques
for Protein Structure Prediction*; http://*Critical Assessment of Prediction of Interactions*;
http://

As illustrated by the previous discussion, modeling macro-molecules touches upon biology, physics and chemistry, as well as mathematics and computer science. In the following, we present the topics investigated within ABS.

The research conducted by ABS focuses on three main directions in Computational Structural Biology (CSB), together with the associated methodological developments:

– Modeling interfaces and contacts,

– Modeling macro-molecular assemblies,

– Modeling the flexibility of macro-molecules,

– Algorithmic foundations.

**Keywords:** Docking, interfaces, protein complexes, structural alphabets,
scoring functions, Voronoi diagrams, arrangements of balls.

The Protein Data Bank, http://*interacting* with atoms of the second
one. Understanding the structure of interfaces is central to
understand biological complexes and thus the function of biological
molecules . Yet, in spite of almost three decades
of investigations, the basic principles guiding the formation of
interfaces and accounting for its stability are unknown
. Current investigations follow two routes.
From the experimental perspective , directed
mutagenesis enables one to quantify the energetic importance of
residues, important residues being termed *hot* residues. Such
studies recently evidenced the *modular* architecture of
interfaces
.
From the modeling perspective, the main issue consists of guessing the
hot residues from sequence and/or structural informations
.

The description of interfaces is also of special interest to improve
*scoring functions*. By scoring function, two things are meant:
either a function which assigns to a complex a quantity homogeneous to
a free energy change

Describing interfaces poses problems in two settings: static and dynamic.

In the static setting, one seeks the minimalist geometric model
providing a relevant bio-physical signal. A first step in doing so
consists of identifying interface atoms, so as to relate the geometry and
the bio-chemistry at the interface level .
To elaborate at the atomic level, one seeks a structural alphabet
encoding the spatial structure of proteins. At the side-chain and
backbone level, an example of such alphabet is that of
. At the atomic level and in spite of recent
observations on the local structure of the neighborhood of a given
atom , no such alphabet is known. Specific
important local conformations are known, though. One of them is the
so-called dehydron structure, which is an under-desolvated hydrogen
bond – a property that can be directly inferred from the spatial
configuration of the

In the dynamic setting, one wishes to understand whether selected (hot) residues exhibit specific dynamic properties, so as to serve as anchors in a binding process . More generally, any significant observation raised in the static setting deserves investigations in the dynamic setting, so as to assess its stability. Such questions are also related to the problem of correlated motions, which we discuss next.

**Keywords:** Macro-molecular assembly, reconstruction by data
integration, proteomics, modeling with uncertainties, curved Voronoi
diagrams, topological persistence.

Large protein assemblies such as the Nuclear Pore Complex (NPC),
chaperonin cavities, the proteasome or ATP synthases, to name a few,
are key to numerous biological functions. To improve our
understanding of these functions, one would ideally like to build and
animate atomic models of these molecular machines. However, this task
is especially tough, due to their size and their plasticity, but also
due to the flexibility of the proteins involved.
In a sense, the modeling challenges arising in this context are
different from those faced for binary docking, and also from those
encountered for intermediate size complexes which are often amenable
to a processing mixing (cryo-EM) image analysis and classical docking.
To face these new challenges, an emerging paradigm is that of
reconstruction by data integration . In a
nutshell, the strategy is reminiscent from NMR and consists of mixing
experimental data from a variety of sources, so as to find out the
model(s) best complying with the data.
This strategy has been in particular used to propose plausible models
of the Nuclear Pore Complex , the largest assembly
known to date in the eukaryotic cell, and consisting of 456 protein
*instances* of 30 *types*.

Reconstruction by data integration requires three ingredients. First,
a parametrized model must be adopted, typically a collection of balls
to model a protein with pseudo-atoms. Second, as in NMR, a functional
measuring the agreement between a model and the data must be
chosen. In , this functional is based upon *restraints*, namely penalties associated to the experimental data.
Third, an optimization scheme must be selected.
The design of restraints is notoriously challenging, due to the
ambiguous nature and/or the noise level of the data.
For example, Tandem Affinity Purification (TAP) gives access to a *pullout* i.e. a list of protein types which are known to interact
with one tagged protein type, but no information on the number of
complexes or on the stoichiometry of proteins types within a complex
is provided.
In cryo-EM, the envelope enclosing an assembly is often imprecisely
defined, in particular in regions of low density. For immuno-EM
labelling experiments, positional uncertainties arise from the
microscope resolution.

These uncertainties coupled with the complexity of the functional
being optimized, which in general is non convex, have two
consequences.
First, it is impossible to single out a unique reconstruction, and a
set of plausible reconstructions must be considered. As an example,
1000 plausible models of the NPC were reported in
. Interestingly, averaging the positions of all
balls of a particular protein type across these models resulted in 30
so-called *probability density maps*, each such map encoding the
probability of presence of a particular protein type at a particular
location in the NPC.
Second, the assessment of all models (individual and averaged) is non
trivial. In particular, the lack of straightforward statistical
analysis of the individual models and the absence of assessment for
the averaged models are detrimental to the mechanistic exploitation of
the reconstruction results. At this stage, such models therefore
remain qualitative.

**Keywords:** Folding, docking, energy landscapes, induced fit,
molecular dynamics, conformers, conformer ensembles, point clouds,
reconstruction, shape learning, Morse theory.

Proteins in vivo vibrate at various frequencies: high frequencies
correspond to small amplitude deformations of chemical bonds, while
low frequencies characterize more global deformations. This
flexibility contributes to the entropy thus the `free energy` of
the system *protein - solvent*. From the experimental standpoint,
NMR studies generate ensembles of conformations, called `conformers`, and so do molecular dynamics (MD) simulations.
Of particular interest while investigating flexibility is the notion
of correlated motion. Intuitively, when a protein is folded, all
atomic movements must be correlated, a constraint which gets
alleviated when the protein unfolds since the steric constraints get
relaxed *diffusion - conformer
selection - induced fit* complex formation model.

Parameterizing these correlated motions, describing the corresponding energy landscapes, as well as handling collections of conformations pose challenging algorithmic problems.

At the side-chain level, the question of improving rotamer libraries is still of interest . This question is essentially a clustering problem in the parameter space describing the side-chains conformations.

At the atomic level, flexibility is essentially investigated resorting to methods based on a classical potential energy (molecular dynamics), and (inverse) kinematics. A molecular dynamics simulation provides a point cloud sampling the conformational landscape of the molecular system investigated, as each step in the simulation corresponds to one point in the parameter space describing the system (the conformational space) . The standard methodology to analyze such a point cloud consists of resorting to normal modes. Recently, though, more elaborate methods resorting to more local analysis , to Morse theory and to analysis of meta-stable states of time series have been proposed.

**Keywords:** Computational geometry, computational topology,
optimization, data analysis.

Making a stride towards a better understanding of the biophysical questions discussed in the previous sections requires various methodological developments, which we briefly discuss now.

In modeling interfaces and contacts, one may favor geometric or topological information.

On the geometric side, the problem of modeling contacts at the atomic
level is tantamount to encoding multi-body relations between an atom
and its neighbors. On the one hand, one may use an encoding of
neighborhoods based on geometric constructions such as Voronoi
diagrams (affine or curved) or arrangements of balls. On the other
hand, one may resort to clustering strategies in higher dimensional
spaces, as the

On the topological side, one may favor constructions which remain
stable if each atom in a structure *retains* the same neighbors,
even though the 3D positions of these neighbors change to some
extent. This process is observed in flexible docking cases, and call
for the development of methods to encode and compare shapes undergoing
tame geometric deformations.

In dealing with large assemblies, a number of methodological developments are called for.

On the experimental side, of particular interest is the disambiguation of proteomics signals. For example, TAP and mass spectrometry data call for the development of combinatorial algorithms aiming at unraveling pairwise contacts between proteins within an assembly. Likewise, density maps coming from electron microscopy, which are often of intermediate resolution (5-10Å) call the development of noise resilient segmentation and interpretation algorithms. The results produced by such algorithms can further be used to guide the docking of high resolutions crystal structures into maps.

As for modeling, two classes of developments are particularly stimulating. The first one is concerned with the design of algorithms performing reconstruction by data integration, a process reminiscent from non convex optimization. The second one encompasses assessment methods, in order to single out the reconstructions which best comply with the experimental data. For that endeavor, the development of geometric and topological models accommodating uncertainties is particularly important.

Given a sampling on an energy landscape, a number of fundamental issues actually arise: how does the point cloud describe the topography of the energy landscape (a question reminiscent from Morse theory)? Can one infer the effective number of degrees of freedom of the system over the simulation, and is this number varying? Answers to these questions would be of major interest to refine our understanding of folding and docking, with applications to the prediction of structural properties. It should be noted in passing that such questions are probably related to modeling phase transitions in statistical physics where geometric and topological methods are being used .

From an algorithmic standpoint, such questions are reminiscent of
*shape learning*. Given a collection of samples on an (unknown) *model*, *learning* consists of guessing the model from the samples
– the result of this process may be called the *reconstruction*. In doing so, two types of guarantees are sought:
topologically speaking, the reconstruction and the model should
(ideally!) be isotopic; geometrically speaking, their Hausdorff
distance should be small.
Motivated by applications in Computer Aided Geometric Design, surface
reconstruction triggered a major activity in the Computational
Geometry community over the past ten years.
Aside from applications, reconstruction
raises a number of deep issues:
the study of distance functions to the model and to the samples,
and their comparison; the study of Morse-like constructions stemming from distance
functions to points; the analysis of topological invariants of the model and the samples,
and their comparison.

*Structural Bioinformatics Library*

Keywords: Structural Biology - Biophysics - Software architecture

Functional Description: The SBL is a generic C++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances.

More specifically, the SBL involves four software components (1-4 thereafter). For end-users, the SBL provides ready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, to deal with molecular flexibility, to model macro-molecular assemblies. These applications can also be combined to tackle integrated analysis problems. For developers, the SBL provides a broad C++ toolbox with modular design, involving core (2) algorithms, (3) biophysical models, and (4) modules, the latter being especially suited to develop novel applications. The SBL comes with a thorough documentation consisting of user and reference manuals, and a bugzilla platform to handle community feedback.

Release Functional Description: In 2017, major efforts targeted two points. First, the simplification of installation procedures. Second, the development of packages revolving on molecular flexibility at large: representations in internal and Cartesian coordinates, generic representation of molecular mechanics force fields (and computation of gradients), exploration algorithms for conformational spaces.

Contact: Frédéric Cazals

Publication: The Structural Bioinformatics Library: modeling in biomolecular science and beyond

**Keywords:** docking, scoring, interfaces, protein complexes, Voronoi diagrams,
arrangements of balls.

In collaboration with P. Boudinot (INRA Jouy-en-Josas) and M-P. Lefranc (University of Montpellier 2).

Antibody-antigen complexes challenge our understanding, as analyses to date failed to unveil the key determinants of binding affinity and interaction specificity. In this work , we partially fill this gap based on novel quantitative analyses using two standardized databases, the IMGT/3Dstructure-DB and the structure affinity benchmark.

First, we introduce a statistical analysis of interfaces which enables the classification of ligand types (protein, peptide, chemical; cross-validated classification error of 9.6%), and yield binding affinity predictions of unprecedented accuracy (median absolute error of 0.878 kcal/mol). Second, we exploit the contributions made by CDRs in terms of position at the interface and atomic packing properties to show that in general, VH CDR3 and VL CDR3 make dominant contributions to the binding affinity, a fact also shown to be consistent with the enthalpy - entropy compensation associated with pre-configuration of CDR3. Our work suggests that the affinity prediction problem could be solved from databases of high resolution crystal structures of complexes with known affinity.

In collaboration with S. Fleischer (1. Charité University Medicine Berlin, Berlin, Germany), S. Ries (2. Deutsches Rheuma-Forschungszentrum Berlin, Berlin, Germany), P. Shen (2.), G.R. Burmester (1.), T. Dörner (1.), S. Fillatreau (2., Institut Necker-Enfants Malades, Université Paris Descartes, IHP Hôpital Necker Enfants Malades).

Rheumatoid arthritis (RA) is associated with abnormal B cell-functions
implicating antibody-dependent and -independent mechanisms. B cells
have emerged as important cytokine-producing cells, and cytokines are
well-known drivers of RA pathogenesis. To identify novel
cytokine-mediated B-cell functions in RA, in this work
, we comprehensively analysed the
capacity of B cells from RA patients with an inadequate response to
disease modifying anti-rheumatic drugs to produce cytokines in
comparison with healthy donors (HD). RA B cells displayed a
constitutively higher production of the pathogenic factors interleukin
(IL)-8 and Gro-

**Keywords:** macro-molecular assembly, reconstruction by data integration,
proteomics, mass spectrometry, modeling with uncertainties, connectivity inference.

In collaboration with N. Cohen (LRI, UMR de l'Université Paris-Sud et du CNRS), F. Havet (Université Côte d'Azur, I3S, UMR de l'Université Nice Sophia et du CNRS), I. Sau (LIRMM, UMR de l'Université Montpellier et du CNRS, and Universidade Federal do Ceará, Brazil).

The *connectivity inference* problem for native mass spectrometry aims at
finding the most plausible pairwise contacts between the individual
subunits of a macro-molecular assembly, given the composition of overlapping
oligomers.
The associated combinatorial optimization problem consists in determining a minimal-cardinality set of contact (edges) such that all the subunits of each oligomer must be “connected” (each oligomer must induce a connected graph).
We studied in the general inference problem that consists of considering more general properties on oligomers.
For this new problem, we are given a list of possible topologies (graphs) for each oligomer and we aim at minimizing the total number of contacts between subunits.
In terms of graphs, we are given a family of subgraphs that can match the structure of the oligomers.
These new constraints reflect biophysical properties: a subunit has a limited number of neighbors (bounded maximum degree of the subgraphs), selected contacts are already known (a given subgraph contained in the complex), etc.
We prove that the problem is NP-complete (no polynomial time algorithm, unless P = NP) for almost all cases.

**Keywords:** protein, flexibility, collective coordinate,
conformational sampling dimensionality reduction.

No new result on this topic in 2017.

**Keywords:** Computational geometry, computational topology,
optimization, data analysis.

Making a stride towards a better understanding of the biophysical questions discussed in the previous sections requires various methodological developments discussed below.

In collaboration with N. Lascano (Universidad de Buenos Aires, Argentina, Université Côte d'Azur, and Inria Sophia Antipolis - Méditerranée, EPI ATHENA), G. Gallardo (2. Université Côte d'Azur and Inria Sophia Antipolis - Méditerranée, EPI ATHENA), D. Wassermann (2).

Finding the common structural brain connectivity network for a given population is an open problem, crucial for current neuro-science. Recent evidence suggests there is a tightly connected network shared between humans. Obtaining this network will, among many advantages, allow us to focus cognitive and clinical analyses on common connections, thus increasing their statistical power. In turn, knowledge about the common network will facilitate novel analyses to understand the structure-function relationship in the brain. In , we present a new algorithm for computing the core structural connectivity network of a subject sample combining graph theory and statistics. Our algorithm works in accordance with novel evidence on brain topology. We analyze the problem theoretically and prove its complexity. Using 309 subjects, we show its advantages when used as a feature selection for connectivity analysis on populations, outperforming the current approaches.

In collaboration with P. Bonami (LIF, UMR d'Aix-Marseille Université et du CNRS, and IBM ILOG CPLEX, Madrid), Y. Vaxès (LIF, UMR d'Aix-Marseille Université et du CNRS).

Network operators must satisfy some Quality of Service requirements for their clients. One of the most important parameters in telecommunication networks is the end-to-end delay of a unit of flow between a source node and a destination node. Given a network and a set of source destination pairs (connections), we consider in the problem of maximizing the sum of the flow under proportional delay constraints. In this paper, the delay for crossing a link is proportional to the total flow crossing this link. If a connection supports non-zero flow, then the sum of the delays along any path corresponding to that connection must be lower than a given bound. The constraints of delay are on-off constraints because if a connection carries zero flow, then there is no constraint for that connection. The difficulty of the problem comes from the choice of the connections supporting non-zero flow. We first prove a general approximation ratio using linear programming for a variant of the problem. We then prove a linear time 2-approximation algorithm when the network is a path. We finally show a Polynomial Time Approximation Scheme when the graph of intersections of the paths has bounded treewidth.

Clustering is a fundamental problem in data science, yet, the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory and various indices (Rand, Jaccard) have been developed. In this work , we go beyond these by providing a novel class of methods computing meta-clusters within each clustering– a meta-cluster is a group of clusters, together with a matching between these. Let the intersection graph of two clusterings be the edge-weighted bipartite graph in which the nodes represent the clusters, the edges represent the non empty intersection between two clusters, and the weight of an edge is the number of common items. We introduce the so-called D-family-matching problem on intersection graphs, with D the upper-bound on the diameter of the graph induced by the clusters of any meta-cluster. First we prove NP-completeness results and unbounded approximation ratio of simple strategies. Second, we design exact polynomial time dynamic programming algorithms for some classes of graphs (in particular trees). Then, we prove spanning-tree based efficient algorithms for general graphs. Our experiments illustrate the role of D as a scale parameter providing information on the relationship between clusters within a clustering and in-between two clusterings. They also show the advantages of our built-in mapping over classical cluster comparison measures such as the variation of information (VI).

Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library (SBL), a generic C++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances.

The SBL is available from http://

See also the section **New Software and Platforms**.

In this section, we describe the collaboration between ABS and MS Vision
(http://

This collaboration is funded by the Instituts Carnots
(http://

Protein complexes underlie most biological functions, so that studying such complexes in native conditions (intact molecular species taken in solution) is of paramount importance in biology and medicine. Unfortunately, the two leading experimental techniques to date, X ray crystallography and cryo electron microscopy, involve aggressive sample reparation (sample crystallization and sample freezing in amorphous ice, respectively) which may damage the structures and/or create artifacts. These experimental constraints legitimate the use of mass spectrometry (MS) to study biomolecules and their complexes under native conditions, using electrospray ionization (ESI), a soft ionization technique developed by John Fenn (Nobel prize in chemistry, 2002). MS actually delivers information on the masses of the molecular species studied, from which further information on the stoichiometry, topology and contacts between subunits can be inferred. Thanks to ESI, MS is expected to play a pivotal role in biology to unravel the structure of macromolecular complexes underlying all major biological processes, in medicine and biotechnology to understand the complex patterns of molecules involved in pathways, and also in biotechnologies for quality checks.

A mass spectrometer delivers a mass spectrum, i.e. an histogram representing the relative abundance of the ions (ionized proteins or protein complexes in our case), as a function of their mass-to-charge (m/z) ratio. Deconvoluting a mass spectrum means transforming it into a human readable mass histogram. Due to the nature of the ESI process (i.e. the inclusion of solvent and various other molecules) and the intrinsic variability of the studied biomolecules in native conditions, the interpretation of such spectra is delicate. Methods currently used are of heuristic nature, failing to satisfactorily handle the aforementioned difficulties. The goal of this collaboration is to develop optimal algorithms and the associated software to fill the critical gap of mass spectra deconvolution. The benefits for the analyst will be twofold, namely time savings, and the identification of previously undetected components. Upon making progress on the deconvolution problem, the collaboration will be expanded on the geometric and topological modeling of large macro-molecular assemblies, a topic to which ABS recently made significant contributions , .

Together with J. Cortés (LAAS/CNRS, Toulouse), and C. Robert
(IBPC/CNRS, Paris), we launched and have been organizing the
Winter Schools series *Algorithms in Structural Bio-informatics*.
These schools are meant to train PhD students and post-docs on
advanced algorithmic techniques in structural biology.
The 2017 Edition, which took place at the CNRS center in Cargese, focused
on
*Protein design*, see https://

– Frédéric Cazals was member of the following program committees:

Symposium On Geometry Processing

Symposium on Solid and Physical Modeling

Intelligent Systems for Molecular Biology (ISMB), PC member of Protein Interactions & Molecular Networks

ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics

JOBIM

Journal of mathematical biology

Bioinformatics

Journal of computational chemistry

– Frédéric Cazals gave the following invited talks:

*Beyond two-sample-tests: localizing data discrepancies in high-dimensional spaces*, NIPS workshop of Topological Data Analysis, Los Angeles, December 2017.

*Modeling in structural bioinformatics:
the tryptic structure - dynamics - function*,
GDR Bioinformatique moléculaire, Paris, November 2017.

*Modèles géométriques pour la prédiction des interactions
macro-moléculaires*, seminar for the course *Géométrie
algorithmique Données, Modèles, Programmes*, by Jean-Daniel
Boissonnat, Chaire d'informatique et sciences numériques, Collège
de France, March 2017.

– Frédéric Cazals:

2010-.... Member of the steering committee of the *GDR
Bioinformatique Moleculaire*, for the *Structure and
macro-molecular interactions* theme.

2017-.... Co-chair, with Yann Ponty, of the working group /
groupe de travail *(GT MASIM - Méthodes Algorithmiques pour les
Structures et Interactions Macromoléculaires*, within the *GDR
de BIoinformatique Moléculaire* (GDR BIM,
http://

– Frédéric Cazals:

2017-.... President of the *Comité de suivi doctoral*
(CSD), Inria Sophia Antipolis - Méditerranée. The CSD supervises all aspects of PhD
student's life within Inria sophia antipolis.

– Dorian Mazauric:

2016-2019. Member of the *Comité de Centre*, Inria Sophia Antipolis - Méditerranée.

Master: Frédéric Cazals (Inria ABS) and Frédéric Chazal (Inria Saclay),
*Foundations of Geometric Methods in Data Analysis*, Data Sciences
Program, Department of Applied Mathematics, Ecole Centrale
Paris. (http://

**PhD thesis, ongoing, 3rd year.** Romain Tetley, *Structural
alignments: beyond the rigid case*. Université Côte d'Azur.
Under the supervision of Frédéric Cazals.

**PhD thesis, ongoing, 3rd year.** Augustin Chevallier,
*Sampling biomolecular systems*. Université Côte d'Azur.
Under the supervision of Frédéric Cazals.

**PhD thesis, ongoing, 1st year.** Denys Bulavka,
*Modeling macro-molecular motions*. Université Côte d'Azur.
Under the supervision of Frédéric Cazals.

**PhD thesis, ongoing, 1st year.** Méliné Simsir, *Modeling drug efflux by Patched*. Université Côte d'Azur.
Thesis co-supervised by Frédéric Cazals and Isabelle Mus-Veteau, IPMC/CNRS.

**Postdoctoral research of Rémi Watrigant, 2016 - 2017.**
Projet de Recherche Exploratoire (Inria).
*Improving inference algorithms for macromolecular structure determination*.
Under the supervision of Dorian Mazauric and Frédéric Havet (Inria
COATI project-team).

– Frédéric Cazals:

Clément Viricel, University of Toulouse, December 2017.
Rapporteur on the PhD thesis
*Contributions au développement d’outils computationnels
de design de protéines : méthodes et algorithmes de
comptage avec garantie*.
Advisors: T. Schiex and S. Barbe.

This section describes the activities of Dorian Mazauric (member of the popularization committee of Inria Sophia Antipolis - Méditerranée).

– **Founding**.

Coordinator of the project GALEJADE (*Graphes et ALgorithmes : Ensembles de Jeux À Destination des Écoliers*) founded by Fondation Blaise Pascal (2017 - 2018).
We aim at developing an educational kit for primary schools in order to play with the notions of graphs and algorithms.
We also propose conferences for the general public.

– **Resources**.

13 posters - *Transmission de pensée - La magie du binaire*: .

2 posters - *Tour de cartes - La magie des graphes et du binaire*: .

– **Activities**.

Training with Laurent Giauffret of 30 teachers (cycle 1) of Académie de Nice (*Jeux avec les graphes et les algorithmes*).

Stage MathC2+: half day activities for 40 high school students (Boruvka algorithm for the minimum spanning tree problem played on “real” graphs constructed with plastic hoops and slats).

Fête de la Science 2017

Village des sciences et de l'innovation au Palais des Congrès d'Antibes Juan-les-Pins.

Conferences (classe préparatoire, 2 classes de seconde, une classe de sixième, 2 classes de CM1).

4 conferences in high schools (Aix-en-Provence, Antibes, Grasse, Miramas), dispositif Science Culture PACA.

One conference in a secondary school (French *college*)
(Cagnes-sur-Mer).