Our project addresses a central question in bioninformatics, namely the molecular levels of organization in the cells. The biological function of macromolecules such as proteins and nucleic acids relies on their dynamic structural nature and their ability to interact with many different partners. Therefore, folding and docking are still major issues in modern structural biology and we currently concentrate our efforts on structure, interactions, evolution and annotation and aim at a contribution to protein engineering and RNA design. With the recent development of molecular systems biology aiming to integrate different levels of information, protein and nucleic acid assemblies’ studies should provide a better understanding on the molecular processes and machinery occurring in the cell and our research extends to several related issus in systems biology.
On the one hand, we study and develop methodological approaches for dealing with macromolecular structures and annotation: the challenge is to develop abstract models that are computationally tractable and biologically relevant. Our approach puts a strong emphasis on the modeling of biological objects using classic formalisms in computer science (languages, trees, graphs...), occasionally decorated and/or weighted to capture features of interest. To that purpose, we rely on the wide array of skills present in our team in the fields of combinatorics, formal languages and discrete mathematics. The resulting models are usually designed to be amenable to a probabilistic analysis, which can be used to assess the relevance of models, or test general hypotheses.
On the other hand, once suitable models are established we apply these computational approaches to several particular problems arising in fundamental molecular biology. One typically aims at designing new specialized algorithms and methods to efficiently compute properties of real biological objects. Tools of choice include exact optimization, relying heavily on dynamic programming, simulations, machine learning and discrete mathematics. As a whole, a common toolkit of computational methods is developed within the group. The trade-off between the biological accuracy of the model and the computational tractability or efficiency is to be addressed in a closed partnership with experimental biology groups. One outcome is to provide software or platform elements to predict either structures or structural and functional annotation. As members of the Inria community, we are part of the ADT BioSciences led by J. Nicolas whose goal is to develop a global Inria Bioinformatics web portal.
Michael Levitt, our international collaborator of the ITSNAP
Associated team, was awarded the Nobel Prize in Chemistry for
the development of multiscale models for complex chemical
systems. The Nobel lecture is available at
http://
At the secondary structure level, we contributed novel generic techniques applicable to dynamic programming and statistical sampling, and applied them to design novel efficient algorithms for probing the conformational space. Another originality of our approach is that we cover a wide range of scales for RNA structure representation. For each scale (atomic, sequence, secondary and tertiary structure...) cutting-edge algorithmic strategies and accurate and efficient tools have been developed or are under development. This offers a new view on the complexity of RNA structure and function that will certainly provide valuable insights for biological studies.
3D modeling was supported by the Digiteo project Japarin-3D. Statistical potentials were supported by Carnage and Itsnap.
Common activity with J. Waldispühl (McGill).
Ever since the seminal work of Zuker and Stiegler, the field of RNA bioinformatics has been characterized by a strong emphasis on the secondary structure. This discrete abstraction of the 3D conformation of RNA has paved the way for a development of quantitative approaches in RNA computational biology, revealing unexpected connections between combinatorics and molecular biology. Using our strong background in enumerative combinatorics, we propose generic and efficient algorithms, both for sampling and counting structures using dynamic programming. These general techniques have been applied to study the sequence-structure relationship , the correction of pyrosequencing errors , , and the efficient detection of multi-stable RNAs (riboswitches) ,.
Joint project with S. Vialette (Marne-la-Vallée), J. Waldispühl (McGill) and Y. Zhang (Wuhan).
It is a natural pursue to build on our understanding of the secondary structure to construct artificial RNAs performing predetermined functions, ultimately targeting therapeutic and synthetic biology applications. Towards this goal, a key element is the design of RNA sequences that fold into a predetermined secondary structure, according to established energy models (inverse-folding problem). Quite surprisingly, and despite two decades of studies of the problem, the computational complexity of the inverse-folding problem is currently unknown.
Within our group, we offer a new methodology, based on weighted random generation and multidimensional Boltzmann sampling, for this problem. Initially lifting the constraint of folding back into the target structure, we explored the random generation of sequences that are compatible with the target, using a probability distribution which favors exponentially sequences of high affinity towards the target. A simple posterior rejection step selects sequences that effectively fold back into the latter, resulting in a global sampling pipeline that showed comparable performances to its competitors based on local search .
Joint project with D. Barth (Versailles) and J. Cohen (Paris-Sud).
The modeling of large RNA 3D structures, that is predicting the three-dimensional structure of a given RNA sequence, relies on two complementary approaches. The approach by homology is used when the structure of a sequence homologous to the sequence of interest has already been resolved experimentally. The main problem then is to calculate an alignment between the known structure and the sequence. The ab initio approach is required when no homologous structure is known for the sequence of interest (or for some parts of it). We work in both directions.
Despite being able to correctly model small globular proteins, the computational structural biology community still craves for efficient force fields and scoring functions for prediction but also good sampling and dynamics strategies.
Our current and future efforts towards knowledge-based scoring function and ion location prediction have been described in .
Over the last two decades a strong connection between robotics and computational structural biology has emerged, in which internal coordinates of proteins are interpreted as a kinematic linkage with rotatable bonds as joints and corresponding groups of atoms as links , , , . Initially, fragments in proteins limited to tens of residues were modeled as a kinematic linkage, but this approach has been extended to encompass (multi-domain) proteins . For RNA, progress in this direction has been realized as well. A kinematics-based conformational sampling algorithm, KGS, for loops was recently developed , but it does not fully utilize the potential of a kinematic model. It breaks and recloses loops using six torsional degrees of freedom, which results in a finite number of solutions. The discrete nature of the solution set in the conformational space makes difficult an optimization of a target function with a gradient descent method. Our methods overcome this limitation by performing a conformational sampling and optimization in a co-dimension 6 subspace. Fragments remain closed, but these methods are limited to proteins. Our objective is to extend the approach proposed in , to nucleic acids and protein/nucleic acid complexes with a view towards improving structure determination of nucleic acids and their complexes and in silico docking experiments of protein/RNA complexes. For that purpose, we have developed a generic strategy for differentiable statistical potentials , that can be directly integrated in the procedure.
Results from in silico docking experiments will also directly benefit structure determination of complexes which, in turn, will provide structural insights in nucleic acid and protein/nucleic acid complexes. From the small proof-of-concept single chain protein implementation of the KGS strategy, we have developed a robust preliminary implementation that can handle RNA and will be further developed to account for multi-chain molecules. Rasmus Fonseca, post-doctoral scholar in the project is currently performing an extensive computational and biological validation.
String searching and pattern matching is a classical area in computer science, enhanced by potential applications to genomic sequences. In Cpm/Spire community, a focus is given to general string algorithms and associated data structures with their theoretical complexity. Our group specialized in a formalization based on languages, weighted by a probabilistic model. Team members have a common expertise in enumeration and random generation of combinatorial sequences or structures, that are admissible according to some given constraints. A special attention is paid to the actual computability of formula or the efficiency of structures design, possibly to be reused in external software.
As a whole, motif detection in genomic sequences is a hot subject in computational biology that allows to address some key questions such as chromosome dynamics or annotation. This area is being renewed by high throughput data and assembly issues. New constraints, such as energy conditions, or sequencing errors and amplification bias that are technology dependent, must be introduced in the models. An other aim is to combine statistical sampling with a fragment based approach for decomposing structures, such as the cycle decomposition used within F. Major's group . In general, in the future, our methods for sampling and sequence data analysis should be extended to take into account such constraints, that are continuously evolving.
Besides applications of analytic combinatorics to computational biology problems, the team addressed general combinatorial problems on words and fundamental issues on languages and data structures.
Molecular interactions often
involve specific motifs. One may cite protein-DNA (cis-regulation),
protein-protein (docking), RNA-RNA (miRNA, frameshift,
circularisation).
Motif detection combines an algorithmic search of potential sites
and a significance assessment. Assessment significance requires a
quantitative criterium. It is generally accepted that
the p-value is a reliable tool that outperforms older criteria such as the
z-score. Amib develops a long term research on word
combinatorics. In the recent years, a general scheme of derivation of analytic formula for the pvalue
under different constraints (
In the mean time, continuous sequences of overlapping words, currently named clumps or clusters turn out to be crucial in random words counting. Notably, they play a fundamental role in the Chen-Stein method of compound Poisson approximation. A first characterization was proposed by Nicodème and al. and this work is currently extended.
This research area is widened by new problems arising from de novo genome assembly or re-assembly. For example, unique mappability of short reads strongly depends of the repetition of words. Although the average values for the length have been studied for long under different constraints, their distribution or profile remained unknown until the seminal paper which provides formulae for binary tries. A collaboration has been started with Lob at Ecole Polytechnique to check these formulae on real data, namely Archae genomes (internship of J. Moussu).
As a second example, numerous new assembling algorithms have recently appeared. Still, the comparison of the results arising from these different algorithms led to significant differences for a given genome assembly. Clearly, strong constraints from the underlying technologies, leading to different data (size, confidence,...) are one origin of the problems and a deeper interpretation is needed, in order to improve algorithms and confidence in the results. One objective is to develop a model of errors, including a statistical model, that takes into account the quality of data for the different technologies, and their volume. This is the subject of an international collaboration with V. Makeev's lab (IoGene, Moscow) and Magnome project-team. Third, Next Generation Sequencing open the way to the study of structural variants in the genome, as recently described in . Defining a probabilistic model that takes into account main dependencies -such as the GC content- is a task o D. Iakovishina's thesis, in a collaboration with V. Boeva (Curie Institute).
Analytical methods may fail when both sequential and structural constraints of sequences are to be modelled or, more generally, when molecular structures such as RNA structures have to be handled. The random generation of combinatorial objects is a natural, alternative, framework to assess the significance of observed phenomena. General and efficient techniques have been developed over the last decades to draw objects uniformly at random from an abstract specification. However, in the context of biological sequences and structures, the uniformity assumption becomes unrealistic, and one has to consider non-uniform distributions in order to derive relevant estimates. Typically, context-free grammars can handle certain kinds of long-range interactions such as base pairings in secondary RNA structures.
In 2005, a new paradigm appeared in the ab initio secondary structure prediction : instead of formulating the problem as a classic optimization, this new approach uses statistical sampling within the space of solutions. Besides giving better, more robust, results, it allows for a fruitful adaptation of tools and algorithms derived in a purely combinatorial setting. Indeed, we have done significant and original progress in this area recently , , including combinatorial models for structures with pseudoknots. Our aim is to combine this paradigm with a fragment based approach for decomposing structures, such as the cycle decomposition used within F. Major's group .
Besides, our work on random generation is also applied in a different fields, namely software testing and model-checking, in a continuing collaboration with the Fortesse group at Lri ,.
The biological function of macromolecules such as proteins and nucleic acids relies on their dynamic structural nature and their ability to interact with many different partners. This is specially challenging as structure flexibility is key and multi-scale modelling , and efficient code are essential .
Our project covers various aspects of biological macromolecule structure and interaction modelling and analysis. First protein structure prediction is addressed through combinatorics. The dynamics of these types of structures is also studied using statistical and robotics inspired strategies. Both provide a good starting point to perform 3D interaction modelling, accurate structure and dynamics being essential. Modelling is then raised to the cell level by studying large protein interaction networks and also the dynamics of molecular pathways.
Our group benefits from a good collaboration network, mainly at Stanford University (USA), Hkust (Hong-Kong) and McGill (Canada). The computational expertise in this field of computational structural biology is represented in a few large groups in the world (e.g. Pande lab at Stanford, Baker lab at U.Washington) that have both dry and wet labs. We also contributed to the Capri experiment organized by leading member of an international community we have been involved in for some time . At Inria, our interest for structural biology is shared by the Abs project-team. A work by D. Ritchie in the Orpailleur project-team (see led to a joint publication with T. Bourquard and J. Azé. Our activities are however now more centered around protein-nucleic acid interactions, multi-scale analysis, robotics inspired strategies and machine learning than protein-protein interactions, algorithms and geometry. We also shared a common interest for large biomolecules and their dynamics with the Nano-D project team and their adaptative sampling strategy. As a whole, we contribute to the development of geometric and machine learning strategies for macromolecular docking.
Protein structure prediction has been and still is extensively studied. Computational approaches have shown interesting results for globular proteins but transmembrane proteins remain a difficult case.
Transmembrane beta-barrel proteins (TMB) account for 20 to 30% of identified proteins in a genome but, due to difficulties with standard experimental techniques, they are only 2% of the RCSB Protein Data Bank. As TMB perform many vital functions, the prediction of their structure is a challenge for life sciences, while the small number of known structures prohibits knowledge-based methods for structure prediction.
As barrel proteins are strongly structured objects, model based methodologies are an interesting alternative to these conventional methods. Jérome Waldisphül's thesis at Lix had opened this track for the common case where a protein folds respecting the order of the sequence, leaving a structure where each strand is bound to the preceding and succeeding ones. The matching constraints were expressed by a grammatical model, for which relatively simple dynamic programming schemes exist.
However, more sophisticated schemes are required when the arrangements of the strands along the barrel do not follow their order in the sequence, as it is the case for Greek key or Jelly roll motifs. The prediction algorithm may then be driven by a permutation on the order of the bonded strands. In his thesis , Van Du Tran developed a methodology for compiling a given permutation into a dynamic programming scheme that may predict the folding of sequences into the corresponding TMB secondary structure. Polynomial complexity upper bounds follow from the calculated DP scheme. Through tree decompositions of the graph that expresses constraints between strands in the barrel, better schemes were investigated in .
The efficiently obtained 3D structures provide a good model for further 3D and interaction analyses.
To better model complexes, various aspects of the scoring problem for protein-protein docking need being addressed . It is also of great interest to introduce a hierarchical analysis of the original complex three-dimensional structures used for learning, obtained by clustering.
A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate solutions, and then a scoring function is used to rank them in order to extract a native-like conformation. We demonstrated that, using Voronoi constructions and a defined set of parameters, we could optimize an accurate scoring function and interaction detection . We also focused on developing other geometric constructions for that purpose: being related to the Voronoi construction, the Laguerre tessellation was expected to better represent the physico-chemical properties of the partners. It also allows a fast computation without losing the intrinsic properties of the biological objects. In , we compare both constructions. We also worked on introducing a hierarchical analysis of the original complex three-dimensional structures used for learning, obtained by clustering. Using this clustering model, in combination with a strong emphasis on the design of efficient complex filters collaborative filtering, we can optimize the scoring functions and get more accurate solutions .
We also decided to extend these techniques to the analysis of protein-nucleic acid complexes. The first preliminary developments and tests are performed by A. Guilhot (See figure ).
Faced with the inherent features of biological and biomedical data, researchers from the database and artificial intelligence communities have joined together to form a community dedicated to the study of the specific problems posed by integrating life sciences data. With the deluge of new sequenced genome sequences and the amount of data produced by high-throughput approaches, the need to cross and compare massive and heterogeneous data is more important than ever to improve functional annotation and design biological networks. Challenges are numerous. One may cite the need to provide support to scientists to perform and share complex and reproducible complex biological analyses. A special attention is paid to the more specific domain of scientific workflows management and ranking biological data. One aims at exploring the relationships between those two domains, from the investigation of various specific problems posed by ranking scientific workflows to the problem of considering consensus workflows.
Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their programming model appeals to bioinformaticians, who use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent bioinformatics tasks and links represent the dataflow. As underlined both in a study and a review of existing approaches, the complexity of such graph structures is increasing over time, making them more difficult to share and reuse.
One of the major current challenges is thus to provide means to reduce the structural complexity of workflows while ensuring that any structural transformation will not have any impact on the executions of the transformed workflows, that is, preserving provenance.
We are addressing the increase of the number of resources available. The BioGuide project aim at helping user navigation in the maze of available biological sources. More recently, a second problem was tackled: the number of answers returned by even one single queried biological resource may be too large for the user to deal with. We have provided solutions for ranking biological data. The main difficulty lies in considering various ranking criteria (recent data first, popular data first, curated data first...). Many approaches combine ranking criteria to design a ranking function, possibly leading to arbitrary choices made in the way of combining the ranking criteria. Instead, in collaboration with the University of Montreal, we have proposed to follow a median ranking approach named BioConsert (for generating Biological Consensus ranking with ties): considering as many rankings as they are ranking criteria for the same data set, and providing a consensus ranking that minimizes the disagreements between the input rankings. We have shown the benefit of using median ranking in several biological settings.
Additionally, in a close collaboration with the Institut Curie, we have also developed the GeneValorization tool that ranks a list of genes of interest given as input with respect to a set of keywords representing the context of study. Here the single ranking criterion considered for each gene is the number of publications in PubMed co-citing the gene name and the keywords. The tool is able to make use of the MeSH taxonomy when considering the keywords and the dictionary of gene names and aliases for the gene names.
Systems Biology involves the systematic study of complex interactions in biological systems using an integrative approach. The goal is to find new emergent properties that may arise from the systemic view in order to understand the wide variety of processes that happen in a biological system. Systems Biology activity can be seen as a cycle composed of theory, computational modelling to propose a hypothesis about a biological process, experimental validation, and use of the experimental results to refine or invalidate the computational model (or even the whole theory). During the past five years, new questions and research domains have been identified, and some members of the team have reoriented a part of their activities on these questions.
Three main types of problems have been studied: metabolic networks, signaling networks and more recently synthetic biology. Networks - have become popular since many crucial problems, coming from biology, medecine, pharmacology, are nowadays stated in these terms: a great number of them are issued from the cancer phenomenom and the will to enhance our understanding in order to propose more efficient therapeutic issues. Metabolism has received the major attention since it concerns a large variety of topics and several methods that have been proposed. Depending on the nature of the biological problem, several methods can be used : discrete deterministic, stochastic, combinatorial, up to continuous differential. Also, the recent rise of synthetic biology proposes similar challenges aiming at improving the production of energy by means of biological systems or at getting more efficient medicamental treatments, for instance.
Elementary flux mode analysis is a powerful tool for the theoretical study of simple metabolic networks. However, when the networks are complex, the determination of elementary flux modes leads to a combinatorial explosion of their number which prevents from drawing simple conclusions from their analysis. Our approach to this problem classifies into a few classes elementary flux modes which share a set of common reactions, called common motifs.
Signaling pathways involving G protein-coupled receptors (GPCR) are excellent targets in pharmacogenomics research. Large amounts of experiments are available in this context while globally interpreting all the experimental data remains a very challenging task for biologists. Our goal is to help the understanding of signaling pathways involving (GPCR) and to provide means to semi-automatically construct the signaling networks.
We have introduced a logic-based method to infer molecular networks and show how it allows inferring signaling networks from the design of a knowledge base. Provenance of inferred data has been carefully collected, allowing quality evaluation. Our method (i) takes into account various kinds of biological experiments and their origin; (ii) mimics the scientist’s reasoning within a first-order logic setting; (iii) specifies precisely the kind of interaction between the molecules; (iv) provides the user with the provenance of each interaction; (v) automatically builds and draws the inferred network .
Observe that a logic-based formalisation is used as in some works carried out in Inria team Dyliss. Amib aim is different, as the design of the network lies on a knowledge-based system describing experimental facts and ontological relationships on backgound knowledge, together with a set of generic and expressive rules, that mimick the expert's reasoning.
This is a collaboration with A. Poupon (Inra-Bios, Tours) that was supported by an Inra-Inria starting grant in 2011-2012.
A great number of methods have been proposed for the study of the behavior of large biological systems. The first one is based on a discrete and direct simulation of the various interactions between the reactants using an entity-centered approach; the second one implements a very efficient variant of the Gillespie stochastic algorithm that can be mixed with the entity-centered method to get the best of both worlds; the third one uses differential equations automatically generated from the set of reactions defining the network.
These three methods have been implemented in an integrated tool, the Hsim system . It mimics the interactions of biomolecules in an environment modelling the membranes and compartments found in real cells. It has been applied to the modelling of the circadian clock of the cyanobacterium, and we have shown pertinent results regarding the spontaneous appearance of oscillations and the factors governing their period .
Synthetic biology begins to be a very popular domain of research. Genetic engineering is a good example of synthetic biology, organisms are artificially modified to boost the production of compounds that might be used in the medical or industrial domains. We have been focused on using synthetic biology for medical diagnostic purposes. In a collaboration with the SysdiagLab (UMR 3145) at Montpellier, P. Amar participates at the CompuBioTic project. The goal is to design, test and build an artificial embedded biological nano-computer in order to detect the biological markers of some human pathologies (colorectal cancer, diabetic nephropathy, etc.). This nano-computer is a small vesicle containing specific enzymes and membrane receptors. These components are chosen in a way that their interactions can sense and report the presence in the environment of molecules involved in the human pathologies tageted. We plan to design a dedicated software suite to help the design and validation of this artificial nano-computer. Hsim is used to help the design and to test qualitatively and quantitatively this "biological computer" before in vitro.
It is now well established in the medical world that the metabolism of organs depends crucially of the way the calls consume oxygene, glucose and the various metabolites that allow them to grow and duplicate. A particular variety of cells, tumour cells, is of major interest. In collaboration with L. Schwartz (AP-HP) and biologists from Inserm-INRA Clermont-Theix we have started a project aiming at identifying the important points in the metabolic machinery that command the changes in behaviour. The main difficulties come from the fact that biologists have listed dozens of concurrent cycles that can be activated alternatively or simultaneously, and that the dynamic characteristics of the chemical reactions are not known accurately.
Given the set of biochemical reactions that describe a metabolic function (e.g. glycolysis, phospholipids' synthesis, etc.) we translate them into a set of o.d.e's whose general form is most often of the Michaelis-Menten type but whose coefficients are usuall very badly determined. The challenge is therefore to extract information as to the system's behavior while making reasonable asumptions on the ranges of values of the parameters. It is sometimes possible to prove mathematically the global stability, but it is also possible to establish it locally in large subdomains by means of simulations.Our program Mpas (Metabolic Pathway Analyser Software) renders the translation in terms of a systems of o.d.e's automatic, leading to easy, almost automatic simulations. Furthermore we have developed a method of systematic analysis of the systems in order to characterize those reactants which determine the possible behaviors: usually they are enzymes whose high or low concentrations force the activation of one of the possible branches of the metabolic pathways. A first set of situations has been validated with a research Inserm-Inra team based in Clermont-Ferrand. In her PhD thesis, defended in 2011, M. Behzadi proved mathematically the decisive influence of the enzyme PEMT on the Choline/Ethylamine cycles.
We study the interest of fungi for biomass transformation.
Cellulose, hemicellulose and lignin are the main components of plant biomass.
Their transformation represent a key energy challenges of the 21st century and
should eventually allow the production of high value new compounds, such as wood
or liquid biofuels (gas or bioethanol).
Among the boring organisms, two groups of fungi differ in how they destroy
the wood compounds.
Analysing new fungi genomes can allow the discover of new species of high interest for bio-transformation.
For a better understanding of how the fungal enzymes facilitates degradation of
plant biomass, we conduct a large-scale analysis of the metabolism of fungi.
Machine learning approaches such like hierarchical rules prediction
are being studied to find new enzymes allowing the transformation of
biomass. The Kegg database http://
A lightweight Java Applet dedicated to the quick drawing of an RNA secondary structure. VARNA is open-source and distributed under the terms of the GNU GPL license. Automatically scales up and down to make the most out of a limited space. Can draw multiple structures simultaneously. Accepts a wide range of documented and illustrated options, and offers editing interactions. Exports the final diagrams in various file formats (svg,eps,jpeg,png,xfig) ...
VARNA currently ships in its 3.9 version, and consists in
Impact: Downloaded
Availability: Distributed under the terms of the GPL v3 licence since 2009 on simple demand to the author(s) at http://varna.lri.fr.
Cartaj is a software that automatically predicts the topological
family of three-way junctions in RNA molecules, from their secondary
structure only : :the sequence and the canonical Watson–Crick pairings. The Cartaj software http://
Rna3Dmotif is a free bundle of three easy-to-install programs aimed to be used in combination to automatically extract recurrent RNA local tertiary motifs. The approach used is based on a graph representation of the RNA tertiary structure using LW nomenclature. It was applied to several widely studied ribosomal RNA structures and the motifs thus found were deposited in a dedicated repository.
Impact: Cited in 17 research manuscripts (source: Google Scholar).
Availability: Distributed under the terms of the licence since 24/03/2009 on simple demand to the author(s) at http://rna3dmotif.lri.fr.
A software dedicated to the random generation of sequences. Supports different
lasses of models, including weighted context-free grammars, Markov models,
ProSITE patterns...
GenRGenS currently ships in its 2.0 version, and consists in
Impact: Downloaded
Availability: Distributed under the terms of the GPL v3 licence since 2006 on simple demand to the author(s) at https://www.lri.fr/ genrgens/.
DiMoVo, DIscriminate between Multimers and MOnomers by VOronoi tessellation : Knowing the oligomeric state of a protein is necessary to understand its function. his tool, accessible as a webserver and still used and maintained, provides a reliable discrimination function to obtain the most favorable state of proteins.
Availability : released in 2008.
VorScore, Voronoi Scoring Function Server : Scoring is a crucial part of a protein-protein procedure and having a quantitave function to evaluate conformations is mandatory. This server provides access to a geometric knowledge-based evaluation function. It is still maintained and widely used. See Bernauer et al., Bioinformatics, 2007 23(5):555-562 for further details.
High-throughput technologies provide fundamental informations concerning thousands of genes. Most of the current biological research laboratories daily use one or more of these technologies and identify lists of genes.
Understanding the results obtained includes accessing to the latest publications concerning individual or multiple genes. Faced to the exponential growth of publications avaliable, this task is becoming particularly difficult to achieve.
Here, we introduce a web-based Java application tool named GeneValorization which aims at making the most of the text-mining effort done downstream to all high throughput technology assays. Regular users come from the Curie Institute, but also the Ebi.
Impact : 925 distinct international users have used GeneValorization and about a hundred use it on a regular basis. The tool is on average used once to twice every day.
Availability :
it is available at
http://
Scientific workflow systems are numerous and equipped of provenance modules able to collect data produced and consumed during workflow runs to enhance reproducibility. An increasing number of approaches have been developed to help managing provenance information. Some of them are able to process data in a polynomial time but they require workflows to have series-parallel (SP) structures. Rewriting any workflow into an SP workflow is thus particularly important.
Spflow answers this need and takes in a workflow (from the Taverna system) and provide a runnable and provenance equivalent (Taverna) workflow."
Impact: The tool is currently used by Taverna's users from the University of Manchester and more generally by myExperiment users.
Availability: Distributed under the terms of the licence since 04/02/2013 on simple demand to the author(s) at http://www.lri.fr/ chenj/SPFlow/.
Scientific workflow systems are numerous and equipped of provenance modules able to collect data produced and consumed during workflow runs to enhance reproducibility. An increasing number of approaches have been developed to help managing provenance information. Some of them are able to process data in a polynomial time but they require workflows to have series-parallel (SP) structures.
SPChecker is able to detect whether or not any Taverna workflow has a series-parallel structure.
Impact: The tool is currently used by Taverna's users from the University of Manchester and more generally by myExperiment users (a collaboration with Manchester has started and should significantly augment the number of potential users).
Availability: Distributed under the terms of the licence since 01/02/2013 on simple demand to the author(s) at http://www.lri.fr/ chenj/SPChecker/.
BioGuide/BioGuideSRS : this software helps the scientists choose suitable sources and tools, find complementary information in sources, and deal with divergent data.
Reference : Sarah Cohen-Boulakia, Olivier Biton, Susan Davidson, Christine Froidevaux, BioGuideSRS: Querying Multiple Sources with a user-centric perspective, Bioinformatics, March, 23(10), 1301-1303, 2007.
Impact: The paper related to the tool has been cited by
Availability: Distributed under the terms of the licence since 01/09/2006 on simple demand to the author(s) at http://bioguide-project.net/.
Hsim (Hyperstructure Simulator) is a simulation tool for studying the dynamics of biochemical processes in a virtual bacteria. The model is given using a language based on probabilistic rewriting rules that mimics the reactions between biochemical species. Hsim is a stochastic automaton that implements an entity-centered model of objects. This kind of modelling approach is an attractive alternative to differential equations for studying the diffusion and interaction of the many different enzymes and metabolites in cells which may be present in either small or large numbers.
The new version of Hsim includes a Stochastic Simulation Algorithm a la Gillespie that can be used with the same model in a standalone way or in a mixed way with the entity-centered algorithm. This new version offers also the possibility to export the model in SciLab for a ODE integration. Last, Hsim can export the differential equations system, equivalent to the model, to LaTeX for pretty-printing.
This software is freely available at http://
Extensive experiments revealed a drift of existing software towards sequences with a high G+C-content. Relying on our random generation methods, we showed how to control this distributional bias in sequences using a multidimensional Boltzmann sampling , . We also explored the combination of random generation (global sampling) and local search into a novel category of glocal approaches, yielding promising results.
Finally, we explored language-theoretic constructs, namely products of finite-state automata and context-free languages, to force or forbid the presence of identified functional motifs within designed sequences .
Ab initio research benefited from our works on research and classification of RNA structural motifs . Significant progress towards the ab initio prediction of the 3D structure of large RNAs were achieved. This problem is beyond the scope of current approaches and we proposed a promising coarse-grained approach based on game theory that scales up to several hundreds of bases.
In the field of RNA computational biology, many algorithms use dynamic programming to partition the folding landscape according to a set of structural parameters. More precisely, the goal is to compute the number (resp. cumulated Boltzmann weight)
In collaboration with P. Clote's group (Boston College), we have described generic algorithmic principles to dramatically decrease these complexities, and make this class of algorithms practical. The main idea is to capture the partitioned space within a large polynomial, which can typically be efficiently evaluated (typically in
The random generation of decomposable combinatorial structures, pioneered by P. Flajolet in the 80s, provides an elegant, yet powerful, framework to model and sample the objects which appear in computational biology. Random samples can then be used to assert the significance of a given observable when closed form formulae are difficult to obtain.
Messenger RNAs (mRNAs) encode proteins, but may also independently feature structured motifs which are crucial to recoding and alternative splicing mechanisms. In order to predict such motifs, the stability of smaller regions within a given mRNA must be compared to that of sequences generated with respect to a background model which, at the same time, preserves the encoded amino-acid sequence and the capacity of the overall sequence to form a stable fold (proxy-ed by the dinucleotide composition). Using multidimensional Boltzmann sampling, we have revisited the underlying – well-defined, yet never solved exactly – random generation problem, and provided the first unbiased and practical algorithm for the problem . The algorithm, developed in collaboration with McGill and Université de Montréal (Canada), has linear time complexity as soon as a small tolerance (typically
Some other biological objects, such as RNA secondary structures, naturally appear with probabilities which are poorly modeled by the uniform distribution. To better model such objects, Denise et al have introduced the weighted distribution, and adapted classic random generation algorithms such that each object within a given combinatorial family can be generated with respect to it. However, the exponentially increasing probability ratio between the most and least probable object sometimes leads to a large degree of redundancy within generated sets . To work around this issue, and generate non-redundant sets of objects, we have proposed a sequential algorithm with deterministically avoids any previously generated word, without introducing any bias in the generation .
Besides, in collaboration with the Fortesse group at Lri, we developed a new divide and conquer algorithm for the random generation of words of regular languages, and we performed a complete benchmarking of all state-of-the-art methods dedicates to this problem .
As a side-product of our previous collaborative studies with J. Waldispühl (McGill, Canada), focusing on sequence/structure relationship in RNA, we revisited the problem of detecting and correcting RNA sequences obtained using pyrrosequencing techniques. Indeed, ribosomal RNAs are often used to estimate the population diversity within a microbiome, and sequencing errors may lead to biased estimates. In this context, we investigated whether a complete knowledge of the RNA secondary structure could be exploited to detect and correct errors in NGS reads.
To that end, we introduced a probabilistic model, defined over all sequences at maximal distance
An algorithm for pvalue computation has been proposed in
that takes into account a
Hiddden Markov Model and an implementation, SufPref,
has been realized (http://
Combinatorics of clumps have been extensively studied, leading to the definition of the so-called canonic clumps. It is shown in that they contain the necessary information needed to calculate, approximate, and study probabilities of occurrences and asymptotics. This motivates the development of a clump automaton. It allows for a derivation of pvalues, decreasing the space and time complexity of the generating function approach or previous weighted automata.
Large deviations approximations are needed for very rare events, e.g. very small pvalues, as Gaussian approximations are known not to be applicable. In , combinatorial properties of words allow to provide an explicit and tractable formula for the tail distribution with a low space and time complexity and a guaranteed tightness. Double strands counting problem is addressed where dependencies between a sequence and its complement plays a fundamental role. A large deviation result is also provided for a set of small sequences, with non-identical distributions. Possible applications are the search of cis-acting elements in regulatory sequences that may be known, for example from ChIP-chip or ChipSeq experiments, as being under a similar regulatory control. In a recent internship at Lix, F. Pirot detected a Chi-like motif in Archae genome.
In a collaboration with AlFarabi University, where M. Régnier acts as a foreign co-advisor), word statistics were used to identify mRNA targets for miRNAs involved in various cancers , .
Transmembrane beta-barrel proteins (TMB) account for 20 to 30% of identified proteins in a genome but, due to difficulties with standard experimental techniques, they are only 2% of the RCSB Protein Data Bank. Therefore, we study and design algorithmic solutions addressing the secondary structure, an abstraction of the 3D conformation of a molecule, that only retains the contacts between its residues. Although this representation may disregard some of the fine details of the molecule conformation, it still retains the general architecture of molecules, and is especially useful in the study of RiboNucleic Acids (RNAs) and transmembrane beta-barrel proteins (TMB). The latter class of proteins accounts for 20 to 30% of identified proteins in a genome but, due to difficulties with standard experimental techniques, they constitute only 2As TMB perform many vital functions, the prediction of their structure is a challenge for life sciences, while the small number of known structures prohibits knowledge-based methods for structure prediction. As TMBs are strongly structured objects, model based methodologies , are an interesting alternative to these conventional methods. The efficiently obtained 3D structures provide a good model for further 3D and interaction analyses.
In a recent work , we focused on the identification of protein-protein complexes based on the putative interaction between pairs of proteins as the sole source of information. From the results obtained on E. coli, we started working on the prediction of multi-body protein complexes from sequence information alone.
In our protein-RNA project, we managed to obtain the first learning results. We optimized the RosettaDock scores and showed that such an optimization cannot be done efficiently without expert knowledge. The first results are to be presented at EGC in 2014 .
The year 2013 saw the conclusion of a long-term collaboration, involving A. Carbone (UPMC) and A. Lopes (IGM, Paris XI). In a recent paper published in the prestigious Plos Computational Biology journal, we showed that combining coarse-grain molecular cross-docking simulations and binding site predictions based on evolutionary sequence analysis is a viable route to identify true interacting partners for hundreds of proteins with a variate set of protein structures and interfaces. Also, we realized a large-scale analysis of protein binding promiscuity and provided a numerical characterization of partner competition and level of interaction strength for about 28000 false-partner interactions. Finally, we demonstrated that binding site prediction is useful to discriminate native partners, but also to scale up the approach to thousands of protein interactions. This study was based on a large computational effort made by thousands of internet users helping the World Community Grid over a period of 7 months.
Work performed in the Data Integration axis this year has been dedicated to the design and implementation of a new approach to reduce the complexity of scientific workflow structures. More precisely, we focused on the presence of “anti-patterns” in the workflow structures, idiomatic structures that lead to over-complicated design. We have then proposed the DistilFlow method and a tool for automatically detecting such anti-patterns and replacing them with different patterns which result in a reduction in the workflow's overall structural complexity (BMC Journal paper accepted, published early 2014). This work has been performed in close collaboration with the Taverna group from the University of Manchester.
DistilFlow is part of J. Chen's thesis who has defended his PhD on October 11th, 2013 and is now back to China as a research assistant in Lanzhou University.
Systems Biology includes the study of interaction networks such as gene regulatory, metabolic, or signaling networks. It involves both designing the topology of the networks and predicting their dynamic and spatiotemporal aspects. It requires the import of concepts from across various disciplines and crosstalk between theory, benchwork, modelling and simulation.
In we have developed a biclustering algorithm for elementary flux modes that is based on the Agglomeration of Common Motifs (ACoM). This allows a drastic diminution of the number of less significant fluxes and a kind of factorization of most important fluxes, yielding an algorithm running in quadratic time in the number of elementary flux modes.
We applied this algorithm to describe the decomposition into elementary flux modes of the central carbon metabolism in Bacillus subtilis and of the yeast mitochondrial energy metabolism. For Bacillus subtilis, a specific inhibition on the second domain of the lipoamide dehydrogenase (pdhD) component of pyruvate dehydrogenase complex that leads to the loss of all fluxes was exhibited . Such a conclusion is not predictable in the classical approach.
A collaboration with Igm on the evolution of metabolic networks is ungoing. We aim at understanding how such networks would emerge over time among the variety of species, and how these changes could be responsible for characteristic life traits. Our methodology to characterize the evolutionary origin of the enzymatic repertoire of different fungal groups relies on machine learning. Preliminary results were presented at Jobim 2013 .
Our goal is to help the understanding of signaling pathways involving (GPCR) and to provide means to semi-automatically construct the signaling networks. Our method takes into account various kinds of biological experiments and their origin and automatically builds and draws the inferred network. Comparing the automatically deduced network with an already known fragment of the FSHR network allowed us to obtain new interesting hypotheses that are currently experimentally tested by biologists, our collaborators from Inra-Biosin Tours. In the next months, experimental data for some GPCR (FSH, 5HT2 et 5HT4) will be prepared by Bios and Igf (Montpellier), in the context of a GPCRnet ANR project.
Besides, in collaboration with K. Inoue, through the NII International Internship Program, we have studied the System Biology Graphical Notation language, a standard for expressing molecular networks, especially signaling networks, and proposed a translation of SBGN-AF into a logical formalism .
In a collaboration of P. Amar with microbiologists, the group of Marie-Joëlle Virolle from the Institut de Génétique et de Microbiologie, a first explicative model was proposed for the sigmoidicity of the shape of the survival curve of bacteria (S. lividans) having a antibiotic resistance gene, expressed at different levels, in presence of a constant concentration of antibiotics , , , .
This is particularly important since this method of inclusion of an antibiotics resistance gene to report the activity of its promoter is widely used in the streptomyces community.
It is shown in M. Behzadi's PhD thesis that most systems have very stable behaviours and that even large variations of their chemical characteristics do not affect the nature of the equilibria. This very general situation has been discovered by simulation but in some cases it is even possible to prove it mathematically.
Our collaborators M. Israël and L. Schwartz have listed more than a hundred tentative such bifurcations that we intend to study systematically. A preliminary study of the mitotic cycle with L. Paulevé has also put in evidence the strong influence of the pH of the cell on its capacity to duplicate. The PhD thesis of E. Bigan, co-directed by S. Daoudi (Univ. Denis Diderot) and J.-M. Steyaert investigates the generic properties of such complex systems and confirms that the ones we have already studied are not exceptions . Some prospective cases are studied in .
A. Denise is the coordinator of the "Japarin-3D" Digiteo project 2012-2016. This project, in collaboration with Prism at Versailles, aims to develop new efficient approaches for predicting the 3D structure of large RNA molecules, by applying game theory and graph algorithms.
A. Denise is involved in the NSD-NGD ANR project 2010-2014. Y. Ponty is involved in the Magnum ANR project (BLAN program, 12/2010–12/2014).
Ch. Froidevaux was responsible for the CNRS-INSERM-Inria Peps grant Identification of metabolic capabilities of fungi by comparative genomic involving Igm, Paris-Sud and UMR GV, CNRS.
Program: Partenariat Hubert Curien (PHC) Procope (Jointly funded by Egide and DAAD)
Project acronym: SOSW
Project title: Sharing and Optimizing Scientific Workflows
Duration: 2013 - 2015
Coordinator: Sarah Cohen-Boulakia
International Partner
U. Humboldt (Berlin, Allemagne)
Institute for Computer Science
Ulf Leser
Abstract : Considerable effort has been put into the development of scientific workflow management systems. They support scientists in developing, running, and monitoring chains of data analysis programs. A variety of systems have reached a level of maturity that allows them to be used by scientists for their bioinformatics experiments, especially including analysis of NGS data. However, each scientific group has its own way of analyzing NGS data, using a particular set of tools, in a particular order. The aim of this project is to exploit the complementary skills of the two European groups involved to develop approaches promoting exchange of (optimized) workflows.
Title: Intelligent Techniques for Structure of Nucleic Acids and Proteins
Inria principal investigator: Julie Bernauer
International Partner (Institution - Laboratory - Researcher):
Stanford University (United States) - Computational Structural Biology, School of Medicine, Structural Biology - Julie Bernauer
Duration: 2012 - 2014
See also: http://
The ITSNAP Associated Team project is dedicated to the computational study of RNA 3D structure and interactions. By developing new molecular hierarchical models for knowledge-based and machine learning techniques, we can provide new insights on the biologically important structural features of RNA and its dynamics. This knowledge of RNA molecules is key in understanding and predicting the function of current and future therapeutic targets.
CARNAGE
Program: Inria-Russia
Title: CARNAGE: Combinatorics of Assembly and RNA in GEnomes
Inria principal investigator: Mireille Régnier
International Partner (Institution - Laboratory - Researcher):
State Research Institute of Genetics and Selection of Industrial Microorganisms (Russia (Russian Federation)) - Bioinformatics laboratory - Mireille Régnier
Duration: 2012- 2014
See also: https://
CARNAGE addresses two main issues on genomic sequences, by combinatorial methods.
Fast development of high throughput technologies has generated a new challenge for computational biology. The recently appeared competing technologies each promise dramatic breakthroughs in both biology and medicine. At the same time the main bottlenecks in applications are the computational analysis of experimental data. The sheer amount of this data as well as the throughput of the experimental dataflow represent a serious challenge to hardware and especially software. We aim at bridging some gaps between the new "next generation"sequencing technologies, and the current state of the art in computational techniques for whole genome comparison. Our focus is on combinatorial analysis for NGS data assembly, interspecies chromosomal comparison, and definition of standard pipelines for routine large scale comparison.
This project also addresses combinatorics of RNA and the prediction of RNA structures, with their possible interactions.
Polytechnique/UPSud and McGill/U. Montréal
Program: CFQCU
Title: Réseau franco-québecois de recherche sur l'ARN
Inria principal investigator: Jean-Marc Steyaert
International Partner (Institution - Laboratory - Researcher):
Mc Gill and Université de Montréal (Canada)
Computer Science Department
Jérôme Waldispühl
Duration: 2012 - 2014
Résumé : The partners have developped complementary expertise on RNA : bioinformatics, combinatorics and algorithms. machine learning, physics and genomics. Methodologies will be developed that combine theoretical simulations and new (high throughput) experimental data. A common high level training at Master and PhD level is organized.
R. Fonseca spent 5 months at SLAC in Stanford to work with Henry van den Bedem. J. Bernauer spent two weeks at SLAC. The associated team members also presented their work at the Inria BIS 2013 Workshop in Stanford https://
Adrien Rougny has been an intern at Nii from February to August 2013 with a support of "Nii International Internship Program. He worked on the topic "Inference and Learning for Systems Biology and Network Dynamics" in Pr. Katsumi Inoue's group, a long-term collaboration of Ch. Froidevaux.
J. Bernauer is coordinator with Pr. X. Huang at the Hong-Kong University of Science and Technology of a Partenariat Hubert Curien (PHC) Procore project (2012-2013). The project is entitled Computational studies of conformational dynamics of the RNA-induced silencing complex and design of miRNAs to target oncogenes.
H.K. Hwang
Subject: Probabilistic Analysis of A Simple Evolutionary Algorithm
Institution:Taipeh University (Taiwan)
V. Reinharz
Subject: RNA 3D structure analysis
Institution: McGill University (Canada)
E. Furletova
Subject: word enumeration
Institution: Institute of Mathematical Problems in Biology (Russia)
C. Moutet (May and June 2013)
Subject: Poor mappability regions in assembly
Institution: ENS Lyon and Ecole Polytechnique Fédérale de Lausanne
Funding: Inria
Supervision: M. Régnier
F. Pirot (May and June 2013)
Subject: Exceptional words in Archae genomes
Institution: ENS Lyon
Funding: Inria
Supervision: M. Régnier
B. Fang (May to July 2013)
Subject: Clumps combinatorics, automata and word asymptotics
Institution: Princeton University (United States)
Funding: Ecole Polytechnique
Supervision: M. Régnier
J. Moussu (April to July 2013)
Subject: Repeats in genomic sequences
Institution: Rennes University
Funding: Inria
Supervision: M. Régnier
M. Pichene (April to July 2013)
Subject: Graph algorithms and protein-protein interactions
Institution: Paris-Sud University
Funding: Inria
Supervision: J. Bernauer
L. Uroshlev (June 2013)
Subject: Reference state for RNA KB potentials
Institution: IOGEN (Moscou, (Russia))
Funding: Inria (CARNAGE)
Supervision: J. Bernauer
O. Berillo (January and december 2013)
Subject: miRNAs and oncogenes.
Institution: El Farabi University (Almaty, (Kazakhstan))
Funding: El Farabi University
Supervision: M. Régnier
A. Bari (March 2013)
Subject: stress-inducible miRNAs
Institution:El Farabi University (Almaty, (Kazakhstan))
Funding: El Farabi University
Supervision: M. Régnier
Sep. 2013–Sep. 2014: Y. Ponty is visiting PIMS and Simon Fraser University (Vancouver, Canada)
The whole team is involved in GDR-Bim (Molecular Bioinformatics, http://
A. Denise, Y. Ponty and M. Régnier participate into the
subdomain Sequence Analysis and to Comatege subgroup of
GDR- Im
(Informatique Mathématique, http://
A. Denise, Y. Ponty, J.-M. Steyaert, and M. Régnier are involved in the Alea working group
(http://
We received in our weekly seminar: D. Saakian (A. Sinica, Taiwan), V. Reinharz (McGill), L. Tchertanov (ENS Cachan), H. Babou (Nantes), P. Ballarini (Ecp), Nicolas Ferey (Limsi), Van Du TRAN Thong (Igm), Ulf Leser (Humboldt U.), A. Zinoviev (Institut Curie, Paris), H.K. Hwang (Taipeh U.).
J. Bernauer presented her works at International Conference on Biomolecular Dynamics: Experiment Meet Computation, KAUST, Saudi Arabia.
R. Fonseca gave a talk at the Inria@SiliconValley Workshop Bis2013 in May in Stanford.
D. Iakovishina gave two talks at Institut Curie : at the weekly seminar of NGSand during the “Structural variants day” in december 2013.
J. Bernauer visited H. van den Bedem at SSRL (Slac) and M. Levitt at Stanford University (USA). She visited the Huang group at Hkust (Hong-Kong).
M. Régnier and D. Iakovishina visited IoGene (Moscow).
P. Amar was a member of the steering committee and chair of the organizing committee for aSSB workshop, advances in Systems and Synthetic Biology, Nice (2013).
J. Bernauer was a member of ICBD 2013 program committee.
S. Cohen-Boulakia was member of the DILS 2013, SWEET 2013 and BDA 2013 program committees and she is member of the editorial board of the Journal on Data Semantics.
Ch. Froidevaux is a member of the editorial board of 1024, Bulletin de la Société Informatique de France, SIF.
Y. Ponty, M. Régnier, and J.-M. Steyaert served as PC members for Bicob 2013 (5th International Conference on Bioinformatics and Computational Biology, Honolulu, USA).
Y. Ponty served as PC member for Ismb/Eccb 2013 (21st International conference on Intelligent Systems for Molecular Biology/12th European Conference on Computational Biology).
M. Régnier co-organized Mccmb'13
http://
F. d'Alché-Buc, Ch. Froidevaux and Y. Ponty were members of Jobim 2013 program committee.
J. Bernauer organized Iamb workshop (Integrative Approaches for Modeling Biomolecular Complexes) in Nice in collaboration with McGill University (Canada) and Nice University.
A one day meeting on Cancer and Metabolism was organized at Lix by J.-M. Steyaert on October 4th.
J. Bernauer is member of the IDEX Paris - Saclay Groupe de travail Sciences du Vivant.
J. Bernauer and C. Froidevaux are member of the Comité de Pilotage of the IDEX Paris - Saclay Institut transverse de Modélisation des Sciences du Vivant.
A. Denise is a member of the Scientific Commission of the Inria-Saclay research center. He is deputy director of the computer science department at University Paris-Sud. He is member of the Academic Senate of the Paris-Saclay University.
Ch. Froidevaux is the head of the Bioinfo group at Lri. She was a member of a hiring committee for a Full Professor position at Polytech Paris Sud, Orsay.
Y. Ponty is an elected member of the Comité national du CNRS (6th section – Foundations of Computer Science and CID 51 –Bioinformatics).
M. Régnier is a deputy-member of Digiteo program committee.
J.-M.Steyaert is a member of the Board of Administrators of Polytechnique.
J.-M. Steyaert has contributed to the organization of a workshop in July 2013 to present currently running projects between AP-HP and Polytechnique. He serves in the selection committee of a MD from HP-HP for a yearly funded research position in the Polytechnique Research Center.
We have and we will go on having trained a group of good multi-disciplinary students both at the Master and PhD level. Being part of this community as a serious training group is obviously an asset. Our project is also very much involved in two major student programs in France: the Master BIBS (Bioinformatique et Biostatistique) at Université Paris-Sud/École Polytechnique and the parcours d'Approfondissement en Bioinformatique at École Polytechnique. We are also involved in a student partnership with McGill University (partenariat France Quebec offering French and Canadian students co-supervised internships (short term -3 to 6 months- or long term -part of the PhD studies-). J.-M. Steyaert is involved in the development of an interdisciplinary cooperation between Polytechnique and AP-HP that will favor interships of Polytechnicians and Masters students in AP-HP operational services.
Ch. Froidevaux is a member of the Scientific Committee of the Computer Science Doctoral School of Paris-Sud University.
J.-M. Steyaert organizes Bibs (M1 and M2) at Ecole Polytechnique. Ch. Froidevaux is co-heading the Master (M1 and M2) at the University Paris Sud. Most team members are teaching in this master.
J.Bernauer was appointed Chargé d'enseignement in the Computer Science Department of École Polytechnique (DIX) in 2013.
Master Bibs: J. Bernauer, Informatique théorique et Programmation Python, 20h, M2, Université Paris-Sud, France
Cycle Ingénieur Polytechnicien: J. Bernauer, Modal Bioinformatique, 18h, 2ème année, École Polytechnique, France
Cycle Ingénieur Polytechnicien: J. Bernauer, Algorithmes et Programmation INF421, 36h, 2ème année, Ecole Polytechnique, France
Cycle Ingénieur Polytechnicien: J. Bernauer, Modal Web Tablette INF441a, 36h, 2ème année, Ecole Polytechnique, France
Cycle Ingénieur Agro Paris Tech: J. Bernauer, Module AAB, cours invité, 3ème année, Agro Paris Tech, France
Master Bibs: Y. Ponty, M. Regnier, J.-M. Steyaert, Combinatoire, Algorithmes, Séquences et Modélisation (Casm), 32h, M2, Université Paris-Sud, France
Master : J.-M. Steyaert, X cycle ingénieur INF582- Datamining, 35h, M1, Ecole Polytechnique, France
Master : J.-M. Steyaert, X cycle ingénieur BioINF588- Algorithms for Bioinformatics, 35h, M1, Ecole Polytechnique, France
Licence : J.-M. Steyaert, X cycle ingénieur Modal-BioInformatique, 45h, L3, Ecole Polytechnique, France
Master : J.-M. Steyaert, Bibs Algorithmique avancée et optimisation, 25h, M2, X-Orsay, M2, Ecole Polytechnique, France
Data Bases, 48h, M1 Bibs (Bioinformatics and BioStatistics), Paris-Sud University, France (C. Froidevaux)
Advanced Algorithmics, 48h, M1 Bibs (Bioinformatics and BioStatistics), Paris-Sud University, France (C. Froidevaux)
Integration and Analysis of heterogeneous data from the Web, 24h, M2 Bibs (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (J. Azé, S. Cohen Boulakia, C. Froidevaux)
Advanced Data Bases and Data Mining, 42h, M2 Bibs (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (S. Cohen Boulakia, C. Froidevaux).
Initiation to Research, 6h, M2 Bibs (Bioinformatics and BioStatistics), Paris-Sud University, France (C. Froidevaux)
Software Engineering for Bioinformatics, 48h, M2 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (P. Amar)
Modelling and Simulation of Biological Processes, 24h, M2 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (P. Amar)
Biological Networks and Systems Biology, 9h, M1 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (P. Amar)
RNAomics and RNA Bioinformatics, 12h, M2 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (A.Denise)
Theoretical Computer Science, 30h, M2 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (A.Denise)
HdR : Patrick Amar, Contributions à l'étude de la dynamique des systèmes biologiques et aux systèmes de calcul en biologie synthétique, Paris Sud University, 19/12/2013
PhD : Jiuqiang Chen, Designing scientific workflows following a structure and provenance -aware strategy, Université Paris Sud, Defended on 10/10/2013, S. Cohen-Boulakia and C. Froidevaux.
PhD in progress : Mélanie Boudard, Game theory and stochastic learning for predicting the three-dimensional structure of large RNA molecules , 15/10/2012, D. Barth (Univ. Versailles), J. Cohen (CNRS, LRI) and A. Denise.
PhD in progress : Marc Bouffard, Étude de circuits logiques moléculaires et détection de portes logiques dans un réseau métabolique, Université Paris Sud, 01/10/2013, P. Amar and F. Molina.
PhD in progress : Bryan Brancotte, Ranking biological and biomedical data: algorithms and applications, Université Paris Sud, 01/10/2012, S. Cohen-Boulakia and A. Denise.
PhD in progress: Adrien Guilhot-Gaudeffroy, Modelling and scoring of protein-RNA complexes, 01/10/2011, J. Azé, J. Bernauer, C. Froidevaux.
PhD in progress: Daria Iakovishina, A Combinatorial Approach to Assembly Algorithms, 01/11/2011, M. Régnier.
PhD in progress : Adrien Rougny, Raisonnements sur des connaissances biologiques pour la construction et l'analyse des réseaux de signalisation, 01/10/2013, C. Froidevaux.
PhD in progress : Antoine Soulé, Evolutionary study of RNA-RNA interactions in yeast, 01/09/2013, J.-M. Steyaert and J. Waldispühl (University McGill, Canada).
PhD in progress : Bo Yang, Bioinformatics approaches for studying the relations between RNA structure and pre-messenger RNA splicing, 01/10/2011, A. Denise and Fu Xiangdong (Wuhan University, China)
PhD in progress : Cong Zeng, Identification of structural motifs in messenger RNAs, 01/10/2011, A. Denise
HDR
Ch. Froidevaux was a reviewer for an HDR (Montpellier).
J.-M. Steyaert served as a jury member for Hubert Lincet HDR defence (Caen).
PhD
P. Amar served as a referee for Laurent Crepin's PhD defence (Brest University).
Ch. Froidevaux served as a referee for a PhD thesis in Rennes and was a member of the committee for J. Leblay.
M. Régnier served as a referee for O. Abdou Arbi's PhD defence (Rennes University).
Funding agencies
ANR 2012-2013, SIMI2, J. Bernauer and S. Cohen-Boulakia
UEFISCDI 2011-2013 (Research Council Romania), Y. Ponty
Selection committees
Cnrs CR/DR: comité national (Section 6 and CID 51), Y. Ponty.
Inria CR2/CR1 committee: Saclay, J. Bernauer;
Maitre de conférence: Paris-Sud, Computer science department, S. Cohen-Boulakia and J. Azé.
Maitre de conférence: Bordeaux I, A. Denise.
Ingénieur de recherche: LIP6 UPMC (Paris), Y. Ponty.
Chargés d'enseignement et Professeur : Ecole Polytechnique, M. Régnier et J.-M. Steyaert.
Outreach seminar at Lycée Blaise Pascal (Orsay, France) – Yann Ponty – Popular science seminar (2h), jointly organised by Inria (Saclay) and Académie de Versailles.
Unite ou café, Inria Saclay Popularization seminar, Les briques de construction de la vie , see: https://
We also had the opportunity to be part of a few valorization related
events. RNA structural studies were presented at the Rencontres Inria Industrie - Modélisation, simulation et calcul intensif in June 2013 http://