PDiffView: Viewing the Difference in Provenance of Workflow Results

AMIB Algorithms and Models for Integrative Biology

Computational Biology

Digital Health, Biology and Earth

http://team.inria.fr/amib/ 2009 May 01 2011 January 01 Laboratoire d'informatique de l'école polytechnique (LIX) Laboratoire de recherche en informatique (LRI) CNRS Université Paris-Sud (Paris 11) Ecole Polytechnique Computational Structural Biology Annotation Systems Biology Machine Learning Algorithms Mireille Régnier Chercheur

Saclay

Team leader, Inria, Senior Researcher oui Patrick Amar Enseignant

Saclay

Univ. Paris-Sud, Associate Professor oui Julie Bernauer Chercheur

Saclay

Inria, Researcher Alain Denise Enseignant

Saclay

Univ. Paris-Sud, Professor oui Loic Paulevé Chercheur

Saclay

CNRS, Researcher, from Oct 2013 Yann Ponty Chercheur

Saclay

CNRS, Researcher Jérôme Azé Enseignant

Saclay

Univ. Paris-Sud, Associate Professor, until Aug 2013 Sarah Cohen-Boulakia Enseignant

Saclay

Univ. Paris-Sud, Associate Professor Christine Froidevaux Enseignant

Saclay

Univ. Paris-Sud,Professor oui Sabine Pérès Enseignant

Saclay

Univ. Paris-Sud, Associate Professor Jean-Marc Steyaert Enseignant

Saclay

Ecole Polytechnique oui Philippe Chassignet CollaborateurExterieur

Saclay

Ecole Polytechnique, Associate Professor Erwan Bigan PhD

Saclay

Ecole Polytechnique Mélanie Boudard PhD

Saclay

Univ. Versailles and Univ. Paris-Sud Bryan Brancotte PhD

Saclay

Univ. Paris-Sud Jiuqiang Chen PhD

Saclay

Univ. Paris-Sud Adrien Guilhot-Gaudeffroy PhD

Saclay

Univ. Paris-Sud Daria Iakovishina PhD

Saclay

Ecole Polytechnique Adrien Rougny PhD

Saclay

ENS Lyon, from Sep 2013 Antoine Soulé PhD

Saclay

Ecole Polytechnique, from Oct 2013 Cong Zeng PhD

Saclay

Univ. Paris-Sud Bo Yang PhD

Saclay

Univ. Paris-Sud and Wuhan University Rasmus Fonseca PostDoc

Saclay

Inria Vladimir Reinharz Visiteur

Saclay

PhD student, from Jan 2013 until May 2013 Évelyne Rayssac Assistant

Saclay

Ecole Polytechnique Overall Objectives Introduction

Our project addresses a central question in bioninformatics, namely the molecular levels of organization in the cells. The biological function of macromolecules such as proteins and nucleic acids relies on their dynamic structural nature and their ability to interact with many different partners. Therefore, folding and docking are still major issues in modern structural biology and we currently concentrate our efforts on structure, interactions, evolution and annotation and aim at a contribution to protein engineering and RNA design. With the recent development of molecular systems biology aiming to integrate different levels of information, protein and nucleic acid assemblies’ studies should provide a better understanding on the molecular processes and machinery occurring in the cell and our research extends to several related issus in systems biology.

On the one hand, we study and develop methodological approaches for dealing with macromolecular structures and annotation: the challenge is to develop abstract models that are computationally tractable and biologically relevant. Our approach puts a strong emphasis on the modeling of biological objects using classic formalisms in computer science (languages, trees, graphs...), occasionally decorated and/or weighted to capture features of interest. To that purpose, we rely on the wide array of skills present in our team in the fields of combinatorics, formal languages and discrete mathematics. The resulting models are usually designed to be amenable to a probabilistic analysis, which can be used to assess the relevance of models, or test general hypotheses.

On the other hand, once suitable models are established we apply these computational approaches to several particular problems arising in fundamental molecular biology. One typically aims at designing new specialized algorithms and methods to efficiently compute properties of real biological objects. Tools of choice include exact optimization, relying heavily on dynamic programming, simulations, machine learning and discrete mathematics. As a whole, a common toolkit of computational methods is developed within the group. The trade-off between the biological accuracy of the model and the computational tractability or efficiency is to be addressed in a closed partnership with experimental biology groups. One outcome is to provide software or platform elements to predict either structures or structural and functional annotation. As members of the Inria community, we are part of the ADT BioSciences led by J. Nicolas whose goal is to develop a global Inria Bioinformatics web portal.

Highlights of the Year

Michael Levitt, our international collaborator of the ITSNAP Associated team, was awarded the Nobel Prize in Chemistry for the development of multiscale models for complex chemical systems. The Nobel lecture is available at http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2013/levitt-lecture.html.

The Best application paper at Egc 2013 was awarded to .

Research Program RNA

At the secondary structure level, we contributed novel generic techniques applicable to dynamic programming and statistical sampling, and applied them to design novel efficient algorithms for probing the conformational space. Another originality of our approach is that we cover a wide range of scales for RNA structure representation. For each scale (atomic, sequence, secondary and tertiary structure...) cutting-edge algorithmic strategies and accurate and efficient tools have been developed or are under development. This offers a new view on the complexity of RNA structure and function that will certainly provide valuable insights for biological studies.

3D modeling was supported by the Digiteo project Japarin-3D. Statistical potentials were supported by Carnage and Itsnap.

Dynamic programming and complexity Alain Denise Yann Ponty Antoine Soulé

Common activity with J. Waldispühl (McGill).

Ever since the seminal work of Zuker and Stiegler, the field of RNA bioinformatics has been characterized by a strong emphasis on the secondary structure. This discrete abstraction of the 3D conformation of RNA has paved the way for a development of quantitative approaches in RNA computational biology, revealing unexpected connections between combinatorics and molecular biology. Using our strong background in enumerative combinatorics, we propose generic and efficient algorithms, both for sampling and counting structures using dynamic programming. These general techniques have been applied to study the sequence-structure relationship , the correction of pyrosequencing errors , , and the efficient detection of multi-stable RNAs (riboswitches) ,.

RNA design. Alain Denise Yann Ponty

Joint project with S. Vialette (Marne-la-Vallée), J. Waldispühl (McGill) and Y. Zhang (Wuhan).

It is a natural pursue to build on our understanding of the secondary structure to construct artificial RNAs performing predetermined functions, ultimately targeting therapeutic and synthetic biology applications. Towards this goal, a key element is the design of RNA sequences that fold into a predetermined secondary structure, according to established energy models (inverse-folding problem). Quite surprisingly, and despite two decades of studies of the problem, the computational complexity of the inverse-folding problem is currently unknown.

Within our group, we offer a new methodology, based on weighted random generation and multidimensional Boltzmann sampling, for this problem. Initially lifting the constraint of folding back into the target structure, we explored the random generation of sequences that are compatible with the target, using a probability distribution which favors exponentially sequences of high affinity towards the target. A simple posterior rejection step selects sequences that effectively fold back into the latter, resulting in a global sampling pipeline that showed comparable performances to its competitors based on local search .

Towards 3D modeling of large molecules Alain Denise Mélanie Boudard

Joint project with D. Barth (Versailles) and J. Cohen (Paris-Sud).

The modeling of large RNA 3D structures, that is predicting the three-dimensional structure of a given RNA sequence, relies on two complementary approaches. The approach by homology is used when the structure of a sequence homologous to the sequence of interest has already been resolved experimentally. The main problem then is to calculate an alignment between the known structure and the sequence. The ab initio approach is required when no homologous structure is known for the sequence of interest (or for some parts of it). We work in both directions.

Statistical and robotics-inspired models for structure and dynamics Julie Bernauer Rasmus Fonseca

Despite being able to correctly model small globular proteins, the computational structural biology community still craves for efficient force fields and scoring functions for prediction but also good sampling and dynamics strategies.

Our current and future efforts towards knowledge-based scoring function and ion location prediction have been described in .

Over the last two decades a strong connection between robotics and computational structural biology has emerged, in which internal coordinates of proteins are interpreted as a kinematic linkage with rotatable bonds as joints and corresponding groups of atoms as links , , , . Initially, fragments in proteins limited to tens of residues were modeled as a kinematic linkage, but this approach has been extended to encompass (multi-domain) proteins . For RNA, progress in this direction has been realized as well. A kinematics-based conformational sampling algorithm, KGS, for loops was recently developed , but it does not fully utilize the potential of a kinematic model. It breaks and recloses loops using six torsional degrees of freedom, which results in a finite number of solutions. The discrete nature of the solution set in the conformational space makes difficult an optimization of a target function with a gradient descent method. Our methods overcome this limitation by performing a conformational sampling and optimization in a co-dimension 6 subspace. Fragments remain closed, but these methods are limited to proteins. Our objective is to extend the approach proposed in , to nucleic acids and protein/nucleic acid complexes with a view towards improving structure determination of nucleic acids and their complexes and in silico docking experiments of protein/RNA complexes. For that purpose, we have developed a generic strategy for differentiable statistical potentials , that can be directly integrated in the procedure.

Results from in silico docking experiments will also directly benefit structure determination of complexes which, in turn, will provide structural insights in nucleic acid and protein/nucleic acid complexes. From the small proof-of-concept single chain protein implementation of the KGS strategy, we have developed a robust preliminary implementation that can handle RNA and will be further developed to account for multi-chain molecules. Rasmus Fonseca, post-doctoral scholar in the project is currently performing an extensive computational and biological validation.

Sequences Julie Bernauer Alain Denise Mireille Régnier Yann Ponty Jean-Marc Steyaert Daria Iakovishina Antoine Soulé

String searching and pattern matching is a classical area in computer science, enhanced by potential applications to genomic sequences. In Cpm/Spire community, a focus is given to general string algorithms and associated data structures with their theoretical complexity. Our group specialized in a formalization based on languages, weighted by a probabilistic model. Team members have a common expertise in enumeration and random generation of combinatorial sequences or structures, that are admissible according to some given constraints. A special attention is paid to the actual computability of formula or the efficiency of structures design, possibly to be reused in external software.

As a whole, motif detection in genomic sequences is a hot subject in computational biology that allows to address some key questions such as chromosome dynamics or annotation. This area is being renewed by high throughput data and assembly issues. New constraints, such as energy conditions, or sequencing errors and amplification bias that are technology dependent, must be introduced in the models. An other aim is to combine statistical sampling with a fragment based approach for decomposing structures, such as the cycle decomposition used within F. Major's group . In general, in the future, our methods for sampling and sequence data analysis should be extended to take into account such constraints, that are continuously evolving.

Combinatorics of motifs Mireille Régnier Daria Iakovishina

Besides applications of analytic combinatorics to computational biology problems, the team addressed general combinatorial problems on words and fundamental issues on languages and data structures.

Molecular interactions often involve specific motifs. One may cite protein-DNA (cis-regulation), protein-protein (docking), RNA-RNA (miRNA, frameshift, circularisation). Motif detection combines an algorithmic search of potential sites and a significance assessment. Assessment significance requires a quantitative criterium. It is generally accepted that the p-value is a reliable tool that outperforms older criteria such as the z-score. Amib develops a long term research on word combinatorics. In the recent years, a general scheme of derivation of analytic formula for the pvalue under different constraints ( $k$ -occurrence, first occurrence, overrepresentation in large sequences,...) has been provided. It relies on a representation of word overlaps in a graph . Recursive equations to compute pvalues may be reduced to a traversal of that graph, leading to a linear algorithm. It allows for a derivation of pvalues, decreasing the space and time complexity of the generating function approach or previous probabilistic weighted automata.

In the mean time, continuous sequences of overlapping words, currently named clumps or clusters turn out to be crucial in random words counting. Notably, they play a fundamental role in the Chen-Stein method of compound Poisson approximation. A first characterization was proposed by Nicodème and al. and this work is currently extended.

This research area is widened by new problems arising from de novo genome assembly or re-assembly. For example, unique mappability of short reads strongly depends of the repetition of words. Although the average values for the length have been studied for long under different constraints, their distribution or profile remained unknown until the seminal paper which provides formulae for binary tries. A collaboration has been started with Lob at Ecole Polytechnique to check these formulae on real data, namely Archae genomes (internship of J. Moussu).

As a second example, numerous new assembling algorithms have recently appeared. Still, the comparison of the results arising from these different algorithms led to significant differences for a given genome assembly. Clearly, strong constraints from the underlying technologies, leading to different data (size, confidence,...) are one origin of the problems and a deeper interpretation is needed, in order to improve algorithms and confidence in the results. One objective is to develop a model of errors, including a statistical model, that takes into account the quality of data for the different technologies, and their volume. This is the subject of an international collaboration with V. Makeev's lab (IoGene, Moscow) and Magnome project-team. Third, Next Generation Sequencing open the way to the study of structural variants in the genome, as recently described in . Defining a probabilistic model that takes into account main dependencies -such as the GC content- is a task o D. Iakovishina's thesis, in a collaboration with V. Boeva (Curie Institute).

Random generation Alain Denise Yann Ponty

Analytical methods may fail when both sequential and structural constraints of sequences are to be modelled or, more generally, when molecular structures such as RNA structures have to be handled. The random generation of combinatorial objects is a natural, alternative, framework to assess the significance of observed phenomena. General and efficient techniques have been developed over the last decades to draw objects uniformly at random from an abstract specification. However, in the context of biological sequences and structures, the uniformity assumption becomes unrealistic, and one has to consider non-uniform distributions in order to derive relevant estimates. Typically, context-free grammars can handle certain kinds of long-range interactions such as base pairings in secondary RNA structures.

In 2005, a new paradigm appeared in the ab initio secondary structure prediction : instead of formulating the problem as a classic optimization, this new approach uses statistical sampling within the space of solutions. Besides giving better, more robust, results, it allows for a fruitful adaptation of tools and algorithms derived in a purely combinatorial setting. Indeed, we have done significant and original progress in this area recently , , including combinatorial models for structures with pseudoknots. Our aim is to combine this paradigm with a fragment based approach for decomposing structures, such as the cycle decomposition used within F. Major's group .

Besides, our work on random generation is also applied in a different fields, namely software testing and model-checking, in a continuing collaboration with the Fortesse group at Lri ,.

Geometry and machine learning for 3D interaction prediction Julie Bernauer Jean-Marc Steyaert Christine Froidevaux Jérôme Azé Adrien Guilhot-Gaudeffroy

The biological function of macromolecules such as proteins and nucleic acids relies on their dynamic structural nature and their ability to interact with many different partners. This is specially challenging as structure flexibility is key and multi-scale modelling , and efficient code are essential .

Our project covers various aspects of biological macromolecule structure and interaction modelling and analysis. First protein structure prediction is addressed through combinatorics. The dynamics of these types of structures is also studied using statistical and robotics inspired strategies. Both provide a good starting point to perform 3D interaction modelling, accurate structure and dynamics being essential. Modelling is then raised to the cell level by studying large protein interaction networks and also the dynamics of molecular pathways.

Our group benefits from a good collaboration network, mainly at Stanford University (USA), Hkust (Hong-Kong) and McGill (Canada). The computational expertise in this field of computational structural biology is represented in a few large groups in the world (e.g. Pande lab at Stanford, Baker lab at U.Washington) that have both dry and wet labs. We also contributed to the Capri experiment organized by leading member of an international community we have been involved in for some time . At Inria, our interest for structural biology is shared by the Abs project-team. A work by D. Ritchie in the Orpailleur project-team (see led to a joint publication with T. Bourquard and J. Azé. Our activities are however now more centered around protein-nucleic acid interactions, multi-scale analysis, robotics inspired strategies and machine learning than protein-protein interactions, algorithms and geometry. We also shared a common interest for large biomolecules and their dynamics with the Nano-D project team and their adaptative sampling strategy. As a whole, we contribute to the development of geometric and machine learning strategies for macromolecular docking.

Combinatorial models for the structure of proteins

Protein structure prediction has been and still is extensively studied. Computational approaches have shown interesting results for globular proteins but transmembrane proteins remain a difficult case.

Transmembrane beta-barrel proteins (TMB) account for 20 to 30% of identified proteins in a genome but, due to difficulties with standard experimental techniques, they are only 2% of the RCSB Protein Data Bank. As TMB perform many vital functions, the prediction of their structure is a challenge for life sciences, while the small number of known structures prohibits knowledge-based methods for structure prediction.

As barrel proteins are strongly structured objects, model based methodologies are an interesting alternative to these conventional methods. Jérome Waldisphül's thesis at Lix had opened this track for the common case where a protein folds respecting the order of the sequence, leaving a structure where each strand is bound to the preceding and succeeding ones. The matching constraints were expressed by a grammatical model, for which relatively simple dynamic programming schemes exist.

However, more sophisticated schemes are required when the arrangements of the strands along the barrel do not follow their order in the sequence, as it is the case for Greek key or Jelly roll motifs. The prediction algorithm may then be driven by a permutation on the order of the bonded strands. In his thesis , Van Du Tran developed a methodology for compiling a given permutation into a dynamic programming scheme that may predict the folding of sequences into the corresponding TMB secondary structure. Polynomial complexity upper bounds follow from the calculated DP scheme. Through tree decompositions of the graph that expresses constraints between strands in the barrel, better schemes were investigated in .

The efficiently obtained 3D structures provide a good model for further 3D and interaction analyses.

3D interaction prediction

To better model complexes, various aspects of the scoring problem for protein-protein docking need being addressed . It is also of great interest to introduce a hierarchical analysis of the original complex three-dimensional structures used for learning, obtained by clustering.

A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate solutions, and then a scoring function is used to rank them in order to extract a native-like conformation. We demonstrated that, using Voronoi constructions and a defined set of parameters, we could optimize an accurate scoring function and interaction detection . We also focused on developing other geometric constructions for that purpose: being related to the Voronoi construction, the Laguerre tessellation was expected to better represent the physico-chemical properties of the partners. It also allows a fast computation without losing the intrinsic properties of the biological objects. In , we compare both constructions. We also worked on introducing a hierarchical analysis of the original complex three-dimensional structures used for learning, obtained by clustering. Using this clustering model, in combination with a strong emphasis on the design of efficient complex filters collaborative filtering, we can optimize the scoring functions and get more accurate solutions .

We also decided to extend these techniques to the analysis of protein-nucleic acid complexes. The first preliminary developments and tests are performed by A. Guilhot (See figure ).

Data Integration Christine Froidevaux Alain Denise Sarah Cohen-Boulakia Bryan Brancotte Jiuqiang Chen

Faced with the inherent features of biological and biomedical data, researchers from the database and artificial intelligence communities have joined together to form a community dedicated to the study of the specific problems posed by integrating life sciences data. With the deluge of new sequenced genome sequences and the amount of data produced by high-throughput approaches, the need to cross and compare massive and heterogeneous data is more important than ever to improve functional annotation and design biological networks. Challenges are numerous. One may cite the need to provide support to scientists to perform and share complex and reproducible complex biological analyses. A special attention is paid to the more specific domain of scientific workflows management and ranking biological data. One aims at exploring the relationships between those two domains, from the investigation of various specific problems posed by ranking scientific workflows to the problem of considering consensus workflows.

Designing and Comparing Scientific workflows Christine Froidevaux Sarah Cohen-Boulakia Jiuqiang Chen

Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their programming model appeals to bioinformaticians, who use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent bioinformatics tasks and links represent the dataflow. As underlined both in a study and a review of existing approaches, the complexity of such graph structures is increasing over time, making them more difficult to share and reuse.

One of the major current challenges is thus to provide means to reduce the structural complexity of workflows while ensuring that any structural transformation will not have any impact on the executions of the transformed workflows, that is, preserving provenance.

Ranking biological data Alain Denise Sarah Cohen-Boulakia Bryan Brancotte

We are addressing the increase of the number of resources available. The BioGuide project aim at helping user navigation in the maze of available biological sources. More recently, a second problem was tackled: the number of answers returned by even one single queried biological resource may be too large for the user to deal with. We have provided solutions for ranking biological data. The main difficulty lies in considering various ranking criteria (recent data first, popular data first, curated data first...). Many approaches combine ranking criteria to design a ranking function, possibly leading to arbitrary choices made in the way of combining the ranking criteria. Instead, in collaboration with the University of Montreal, we have proposed to follow a median ranking approach named BioConsert (for generating Biological Consensus ranking with ties): considering as many rankings as they are ranking criteria for the same data set, and providing a consensus ranking that minimizes the disagreements between the input rankings. We have shown the benefit of using median ranking in several biological settings.

Additionally, in a close collaboration with the Institut Curie, we have also developed the GeneValorization tool that ranks a list of genes of interest given as input with respect to a set of keywords representing the context of study. Here the single ranking criterion considered for each gene is the number of publications in PubMed co-citing the gene name and the keywords. The tool is able to make use of the MeSH taxonomy when considering the keywords and the dictionary of gene names and aliases for the gene names.

Systems Biology Patrick Amar Sarah Cohen-Boulakia Alain Denise Christine Froidevaux Loic Paulevé Sabine Pérès Laurent Schwartz Jean-Marc Steyaert Erwan Bigan Adrien Rougny

Systems Biology involves the systematic study of complex interactions in biological systems using an integrative approach. The goal is to find new emergent properties that may arise from the systemic view in order to understand the wide variety of processes that happen in a biological system. Systems Biology activity can be seen as a cycle composed of theory, computational modelling to propose a hypothesis about a biological process, experimental validation, and use of the experimental results to refine or invalidate the computational model (or even the whole theory). During the past five years, new questions and research domains have been identified, and some members of the team have reoriented a part of their activities on these questions.

Three main types of problems have been studied: metabolic networks, signaling networks and more recently synthetic biology. Networks - have become popular since many crucial problems, coming from biology, medecine, pharmacology, are nowadays stated in these terms: a great number of them are issued from the cancer phenomenom and the will to enhance our understanding in order to propose more efficient therapeutic issues. Metabolism has received the major attention since it concerns a large variety of topics and several methods that have been proposed. Depending on the nature of the biological problem, several methods can be used : discrete deterministic, stochastic, combinatorial, up to continuous differential. Also, the recent rise of synthetic biology proposes similar challenges aiming at improving the production of energy by means of biological systems or at getting more efficient medicamental treatments, for instance.

Topological analysis of metabolic networks Sabine Pérès

Elementary flux mode analysis is a powerful tool for the theoretical study of simple metabolic networks. However, when the networks are complex, the determination of elementary flux modes leads to a combinatorial explosion of their number which prevents from drawing simple conclusions from their analysis. Our approach to this problem classifies into a few classes elementary flux modes which share a set of common reactions, called common motifs.

Signaling networks Sarah Cohen-Boulakia Christine Froidevaux Adrien Rougny

Signaling pathways involving G protein-coupled receptors (GPCR) are excellent targets in pharmacogenomics research. Large amounts of experiments are available in this context while globally interpreting all the experimental data remains a very challenging task for biologists. Our goal is to help the understanding of signaling pathways involving (GPCR) and to provide means to semi-automatically construct the signaling networks.

We have introduced a logic-based method to infer molecular networks and show how it allows inferring signaling networks from the design of a knowledge base. Provenance of inferred data has been carefully collected, allowing quality evaluation. Our method (i) takes into account various kinds of biological experiments and their origin; (ii) mimics the scientist’s reasoning within a first-order logic setting; (iii) specifies precisely the kind of interaction between the molecules; (iv) provides the user with the provenance of each interaction; (v) automatically builds and draws the inferred network .

Observe that a logic-based formalisation is used as in some works carried out in Inria team Dyliss. Amib aim is different, as the design of the network lies on a knowledge-based system describing experimental facts and ontological relationships on backgound knowledge, together with a set of generic and expressive rules, that mimick the expert's reasoning.

This is a collaboration with A. Poupon (Inra-Bios, Tours) that was supported by an Inra-Inria starting grant in 2011-2012.

Modelling and Simulation Patrick Amar Sarah Cohen-Boulakia Loic Paulevé Laurent Schwartz Jean-Marc Steyaert Erwan Bigan

A great number of methods have been proposed for the study of the behavior of large biological systems. The first one is based on a discrete and direct simulation of the various interactions between the reactants using an entity-centered approach; the second one implements a very efficient variant of the Gillespie stochastic algorithm that can be mixed with the entity-centered method to get the best of both worlds; the third one uses differential equations automatically generated from the set of reactions defining the network.

These three methods have been implemented in an integrated tool, the Hsim system . It mimics the interactions of biomolecules in an environment modelling the membranes and compartments found in real cells. It has been applied to the modelling of the circadian clock of the cyanobacterium, and we have shown pertinent results regarding the spontaneous appearance of oscillations and the factors governing their period .

Synthetic biology

Synthetic biology begins to be a very popular domain of research. Genetic engineering is a good example of synthetic biology, organisms are artificially modified to boost the production of compounds that might be used in the medical or industrial domains. We have been focused on using synthetic biology for medical diagnostic purposes. In a collaboration with the SysdiagLab (UMR 3145) at Montpellier, P. Amar participates at the CompuBioTic project. The goal is to design, test and build an artificial embedded biological nano-computer in order to detect the biological markers of some human pathologies (colorectal cancer, diabetic nephropathy, etc.). This nano-computer is a small vesicle containing specific enzymes and membrane receptors. These components are chosen in a way that their interactions can sense and report the presence in the environment of molecules involved in the human pathologies tageted. We plan to design a dedicated software suite to help the design and validation of this artificial nano-computer. Hsim is used to help the design and to test qualitatively and quantitatively this "biological computer" before in vitro.

Evaluating metabolic networks

It is now well established in the medical world that the metabolism of organs depends crucially of the way the calls consume oxygene, glucose and the various metabolites that allow them to grow and duplicate. A particular variety of cells, tumour cells, is of major interest. In collaboration with L. Schwartz (AP-HP) and biologists from Inserm-INRA Clermont-Theix we have started a project aiming at identifying the important points in the metabolic machinery that command the changes in behaviour. The main difficulties come from the fact that biologists have listed dozens of concurrent cycles that can be activated alternatively or simultaneously, and that the dynamic characteristics of the chemical reactions are not known accurately.

Given the set of biochemical reactions that describe a metabolic function (e.g. glycolysis, phospholipids' synthesis, etc.) we translate them into a set of o.d.e's whose general form is most often of the Michaelis-Menten type but whose coefficients are usuall very badly determined. The challenge is therefore to extract information as to the system's behavior while making reasonable asumptions on the ranges of values of the parameters. It is sometimes possible to prove mathematically the global stability, but it is also possible to establish it locally in large subdomains by means of simulations.Our program Mpas (Metabolic Pathway Analyser Software) renders the translation in terms of a systems of o.d.e's automatic, leading to easy, almost automatic simulations. Furthermore we have developed a method of systematic analysis of the systems in order to characterize those reactants which determine the possible behaviors: usually they are enzymes whose high or low concentrations force the activation of one of the possible branches of the metabolic pathways. A first set of situations has been validated with a research Inserm-Inra team based in Clermont-Ferrand. In her PhD thesis, defended in 2011, M. Behzadi proved mathematically the decisive influence of the enzyme PEMT on the Choline/Ethylamine cycles.

Comparison of Metabolic Networks

We study the interest of fungi for biomass transformation. Cellulose, hemicellulose and lignin are the main components of plant biomass. Their transformation represent a key energy challenges of the 21st century and should eventually allow the production of high value new compounds, such as wood or liquid biofuels (gas or bioethanol). Among the boring organisms, two groups of fungi differ in how they destroy the wood compounds. Analysing new fungi genomes can allow the discover of new species of high interest for bio-transformation. For a better understanding of how the fungal enzymes facilitates degradation of plant biomass, we conduct a large-scale analysis of the metabolism of fungi. Machine learning approaches such like hierarchical rules prediction are being studied to find new enzymes allowing the transformation of biomass. The Kegg database http://www.genome.jp/kegg/ contains pathways related to fungi and other species. By analysing these known pathways with rules mining approaches, we aim to predict new enzymes activities.

Software and Platforms VARNA Yann Ponty correspondant Alain Denise

A lightweight Java Applet dedicated to the quick drawing of an RNA secondary structure. VARNA is open-source and distributed under the terms of the GNU GPL license. Automatically scales up and down to make the most out of a limited space. Can draw multiple structures simultaneously. Accepts a wide range of documented and illustrated options, and offers editing interactions. Exports the final diagrams in various file formats (svg,eps,jpeg,png,xfig) ...

VARNA currently ships in its 3.9 version, and consists in $\sim$ 50 000 lines of code in $\sim$ 250 classes.

Impact: Downloaded $\sim$ 10 000 times and is cited by more than $\sim$ 170 research manuscripts (source: Google Scholar).

Availability: Distributed under the terms of the GPL v3 licence since 2009 on simple demand to the author(s) at http://varna.lri.fr.

Cartaj Alain Denise correspondant

Cartaj is a software that automatically predicts the topological family of three-way junctions in RNA molecules, from their secondary structure only : :the sequence and the canonical Watson–Crick pairings. The Cartaj software http://cartaj.lri.fr that implements our method can be used online. It is also meant for being part of RNA modelling softwares and platforms. The methodology and the results of Cartaj are presented in . More than 300 visits since its release in January 2012.

Rna3Dmotif Alain Denise correspondant

Rna3Dmotif is a free bundle of three easy-to-install programs aimed to be used in combination to automatically extract recurrent RNA local tertiary motifs. The approach used is based on a graph representation of the RNA tertiary structure using LW nomenclature. It was applied to several widely studied ribosomal RNA structures and the motifs thus found were deposited in a dedicated repository.

Impact: Cited in 17 research manuscripts (source: Google Scholar).

Availability: Distributed under the terms of the licence since 24/03/2009 on simple demand to the author(s) at http://rna3dmotif.lri.fr.

GenRGenS Yann Ponty correspondant Alain Denise

A software dedicated to the random generation of sequences. Supports different lasses of models, including weighted context-free grammars, Markov models, ProSITE patterns... GenRGenS currently ships in its 2.0 version, and consists in $\sim$ 25 000 lines of code in $\sim$ 120 Java classes.

Impact: Downloaded $\sim$ 5 000 times and is cited by more than $\sim$ 50 research manuscripts (source: Google Scholar).

Availability: Distributed under the terms of the GPL v3 licence since 2006 on simple demand to the author(s) at https://www.lri.fr/ genrgens/.

DiMoVo Julie Bernauer correspondant

DiMoVo, DIscriminate between Multimers and MOnomers by VOronoi tessellation : Knowing the oligomeric state of a protein is necessary to understand its function. his tool, accessible as a webserver and still used and maintained, provides a reliable discrimination function to obtain the most favorable state of proteins.

Availability : released in 2008.

VorScore Julie Bernauer correspondant

VorScore, Voronoi Scoring Function Server : Scoring is a crucial part of a protein-protein procedure and having a quantitave function to evaluate conformations is mandatory. This server provides access to a geometric knowledge-based evaluation function. It is still maintained and widely used. See Bernauer et al., Bioinformatics, 2007 23(5):555-562 for further details.

GeneValorization Bryan Brancotte Sarah Cohen-Boulakia correspondant

High-throughput technologies provide fundamental informations concerning thousands of genes. Most of the current biological research laboratories daily use one or more of these technologies and identify lists of genes.

Understanding the results obtained includes accessing to the latest publications concerning individual or multiple genes. Faced to the exponential growth of publications avaliable, this task is becoming particularly difficult to achieve.

Here, we introduce a web-based Java application tool named GeneValorization which aims at making the most of the text-mining effort done downstream to all high throughput technology assays. Regular users come from the Curie Institute, but also the Ebi.

Impact : 925 distinct international users have used GeneValorization and about a hundred use it on a regular basis. The tool is on average used once to twice every day.

Availability : it is available at http://bioguide-project.net/gv with Inter Deposit Digital Number (depot APP, June 2013).

SPFlow Sarah Cohen-Boulakia correspondant

Scientific workflow systems are numerous and equipped of provenance modules able to collect data produced and consumed during workflow runs to enhance reproducibility. An increasing number of approaches have been developed to help managing provenance information. Some of them are able to process data in a polynomial time but they require workflows to have series-parallel (SP) structures. Rewriting any workflow into an SP workflow is thus particularly important.

Spflow answers this need and takes in a workflow (from the Taverna system) and provide a runnable and provenance equivalent (Taverna) workflow."

Impact: The tool is currently used by Taverna's users from the University of Manchester and more generally by myExperiment users.

Availability: Distributed under the terms of the licence since 04/02/2013 on simple demand to the author(s) at http://www.lri.fr/ chenj/SPFlow/.

SPChecker Sarah Cohen-Boulakia correspondant

SPChecker is able to detect whether or not any Taverna workflow has a series-parallel structure.

Impact: The tool is currently used by Taverna's users from the University of Manchester and more generally by myExperiment users (a collaboration with Manchester has started and should significantly augment the number of potential users).

Availability: Distributed under the terms of the licence since 01/02/2013 on simple demand to the author(s) at http://www.lri.fr/ chenj/SPChecker/.

BioGuide Sarah Cohen-Boulakia correspondant Christine Froidevaux

BioGuide/BioGuideSRS : this software helps the scientists choose suitable sources and tools, find complementary information in sources, and deal with divergent data.

Reference : Sarah Cohen-Boulakia, Olivier Biton, Susan Davidson, Christine Froidevaux, BioGuideSRS: Querying Multiple Sources with a user-centric perspective, Bioinformatics, March, 23(10), 1301-1303, 2007.

Impact: The paper related to the tool has been cited by $\sim$ 26 research manuscripts (source: Google Scholar) so far. Since 2007 and up to now, BioGuide has 8,030 distinct users including regular users from the EBI (European Bioinformatics Institute), the Institut Curie and the Children's Hospital of Philadelphia.

Availability: Distributed under the terms of the licence since 01/09/2006 on simple demand to the author(s) at http://bioguide-project.net/.

HSIM Patrick Amar correspondant

Hsim (Hyperstructure Simulator) is a simulation tool for studying the dynamics of biochemical processes in a virtual bacteria. The model is given using a language based on probabilistic rewriting rules that mimics the reactions between biochemical species. Hsim is a stochastic automaton that implements an entity-centered model of objects. This kind of modelling approach is an attractive alternative to differential equations for studying the diffusion and interaction of the many different enzymes and metabolites in cells which may be present in either small or large numbers.

The new version of Hsim includes a Stochastic Simulation Algorithm a la Gillespie that can be used with the same model in a standalone way or in a mixed way with the entity-centered algorithm. This new version offers also the possibility to export the model in SciLab for a ODE integration. Last, Hsim can export the differential equations system, equivalent to the model, to LaTeX for pretty-printing.

This software is freely available at http://www.lri.fr/~pa/Hsim; A compiled version is available for the Windows, Linux and MacOSX operating systems.

New Results RNA

RNA design through random generation

Extensive experiments revealed a drift of existing software towards sequences with a high G+C-content. Relying on our random generation methods, we showed how to control this distributional bias in sequences using a multidimensional Boltzmann sampling , . We also explored the combination of random generation (global sampling) and local search into a novel category of glocal approaches, yielding promising results.

Finally, we explored language-theoretic constructs, namely products of finite-state automata and context-free languages, to force or forbid the presence of identified functional motifs within designed sequences .

Towards 3D modeling of large molecules

Ab initio research benefited from our works on research and classification of RNA structural motifs . Significant progress towards the ab initio prediction of the 3D structure of large RNAs were achieved. This problem is beyond the scope of current approaches and we proposed a promising coarse-grained approach based on game theory that scales up to several hundreds of bases.

Fast-fourier transform for riboswitch

In the field of RNA computational biology, many algorithms use dynamic programming to partition the folding landscape according to a set of structural parameters. More precisely, the goal is to compute the number (resp. cumulated Boltzmann weight) $c_{p_{1}, p_{2}, p_{3} ...}$ of secondary structures having $p_{i}$ occurrences of some structural parameter $P_{i}$ , where $P_{i}$ may denote the distance to a reference structure, the number of # helices, base-pairs...The resulting algorithms, although polynomial in theory, are usually unusable in practice, particularly due to their unreasonable complexities (typically $Θ (n^{3 + 2 k}) / Θ (n^{2 + k})$ time/memory for $k$ parameters) and the intrinsic difficulties one encounters while trying to distribute their computation over multiple processors (highly connected dependency graph).

In collaboration with P. Clote's group (Boston College), we have described generic algorithmic principles to dramatically decrease these complexities, and make this class of algorithms practical. The main idea is to capture the partitioned space within a large polynomial, which can typically be efficiently evaluated (typically in $Θ (n^{3})$ ) as soon as the parameters are additive. One can then perform (possibly in parallel) $Θ (n^{k})$ independent evaluations of the polynomial, and use the Discrete Fourier Transform to recover the coefficients in $Θ (k \cdot n^{k} \cdot log (n))$ time. Applying these principles to the RNAbor algorithm, whose complexities were in $Θ (n^{5}) / Θ (n^{3})$ , we obtained an novel $Θ (n^{4}) / Θ (n^{2})$ (parallelizable in $Θ (n^{3}) / Θ (n^{2})$ time/memory on $m \to \infty$ processors), we obtained a novel algorithm to detect bistable thermodynamic structures, such as riboswitches, which we presented at Recomb'13 .

Sequences Random generation

The random generation of decomposable combinatorial structures, pioneered by P. Flajolet in the 80s, provides an elegant, yet powerful, framework to model and sample the objects which appear in computational biology. Random samples can then be used to assert the significance of a given observable when closed form formulae are difficult to obtain.

Messenger RNAs (mRNAs) encode proteins, but may also independently feature structured motifs which are crucial to recoding and alternative splicing mechanisms. In order to predict such motifs, the stability of smaller regions within a given mRNA must be compared to that of sequences generated with respect to a background model which, at the same time, preserves the encoded amino-acid sequence and the capacity of the overall sequence to form a stable fold (proxy-ed by the dinucleotide composition). Using multidimensional Boltzmann sampling, we have revisited the underlying – well-defined, yet never solved exactly – random generation problem, and provided the first unbiased and practical algorithm for the problem . The algorithm, developed in collaboration with McGill and Université de Montréal (Canada), has linear time complexity as soon as a small tolerance (typically $Θ (1 / \sqrt{n})$ ) on the composition is allowed.

Some other biological objects, such as RNA secondary structures, naturally appear with probabilities which are poorly modeled by the uniform distribution. To better model such objects, Denise et al have introduced the weighted distribution, and adapted classic random generation algorithms such that each object within a given combinatorial family can be generated with respect to it. However, the exponentially increasing probability ratio between the most and least probable object sometimes leads to a large degree of redundancy within generated sets . To work around this issue, and generate non-redundant sets of objects, we have proposed a sequential algorithm with deterministically avoids any previously generated word, without introducing any bias in the generation .

Besides, in collaboration with the Fortesse group at Lri, we developed a new divide and conquer algorithm for the random generation of words of regular languages, and we performed a complete benchmarking of all state-of-the-art methods dedicates to this problem .

Next Generation Sequencing (NGS)

As a side-product of our previous collaborative studies with J. Waldispühl (McGill, Canada), focusing on sequence/structure relationship in RNA, we revisited the problem of detecting and correcting RNA sequences obtained using pyrrosequencing techniques. Indeed, ribosomal RNAs are often used to estimate the population diversity within a microbiome, and sequencing errors may lead to biased estimates. In this context, we investigated whether a complete knowledge of the RNA secondary structure could be exploited to detect and correct errors in NGS reads.

To that end, we introduced a probabilistic model, defined over all sequences at maximal distance $d$ from the input read and their respective folding. This model captures both the stability of the induced fold and its compatibility with a reference multiple sequence alignment. We designed a linear-time inside/outside algorithm to compute exactly the probability that a given position is mutated in the ensemble. Our initial implementation, presented at Recomb'13 and published an extended version in Journal of Computational Biology , revealed encouraging results, and we plan to combine it with a population diversity estimator to test its potential in a metagenomics context.

Combinatorics of motifs

An algorithm for pvalue computation has been proposed in that takes into account a Hiddden Markov Model and an implementation, SufPref, has been realized (http://server2.lpm.org.ru/bio).

Combinatorics of clumps have been extensively studied, leading to the definition of the so-called canonic clumps. It is shown in that they contain the necessary information needed to calculate, approximate, and study probabilities of occurrences and asymptotics. This motivates the development of a clump automaton. It allows for a derivation of pvalues, decreasing the space and time complexity of the generating function approach or previous weighted automata.

Large deviations approximations are needed for very rare events, e.g. very small pvalues, as Gaussian approximations are known not to be applicable. In , combinatorial properties of words allow to provide an explicit and tractable formula for the tail distribution with a low space and time complexity and a guaranteed tightness. Double strands counting problem is addressed where dependencies between a sequence and its complement plays a fundamental role. A large deviation result is also provided for a set of small sequences, with non-identical distributions. Possible applications are the search of cis-acting elements in regulatory sequences that may be known, for example from ChIP-chip or ChipSeq experiments, as being under a similar regulatory control. In a recent internship at Lix, F. Pirot detected a Chi-like motif in Archae genome.

In a collaboration with AlFarabi University, where M. Régnier acts as a foreign co-advisor), word statistics were used to identify mRNA targets for miRNAs involved in various cancers , .

3D Modelling and Interactions

Transmembrane beta-barrel proteins (TMB) account for 20 to 30% of identified proteins in a genome but, due to difficulties with standard experimental techniques, they are only 2% of the RCSB Protein Data Bank. Therefore, we study and design algorithmic solutions addressing the secondary structure, an abstraction of the 3D conformation of a molecule, that only retains the contacts between its residues. Although this representation may disregard some of the fine details of the molecule conformation, it still retains the general architecture of molecules, and is especially useful in the study of RiboNucleic Acids (RNAs) and transmembrane beta-barrel proteins (TMB). The latter class of proteins accounts for 20 to 30% of identified proteins in a genome but, due to difficulties with standard experimental techniques, they constitute only 2As TMB perform many vital functions, the prediction of their structure is a challenge for life sciences, while the small number of known structures prohibits knowledge-based methods for structure prediction. As TMBs are strongly structured objects, model based methodologies , are an interesting alternative to these conventional methods. The efficiently obtained 3D structures provide a good model for further 3D and interaction analyses.

In a recent work , we focused on the identification of protein-protein complexes based on the putative interaction between pairs of proteins as the sole source of information. From the results obtained on E. coli, we started working on the prediction of multi-body protein complexes from sequence information alone.

In our protein-RNA project, we managed to obtain the first learning results. We optimized the RosettaDock scores and showed that such an optimization cannot be done efficiently without expert knowledge. The first results are to be presented at EGC in 2014 .

Large scale cross-docking study of the specificity of protein-protein interactions

The year 2013 saw the conclusion of a long-term collaboration, involving A. Carbone (UPMC) and A. Lopes (IGM, Paris XI). In a recent paper published in the prestigious Plos Computational Biology journal, we showed that combining coarse-grain molecular cross-docking simulations and binding site predictions based on evolutionary sequence analysis is a viable route to identify true interacting partners for hundreds of proteins with a variate set of protein structures and interfaces. Also, we realized a large-scale analysis of protein binding promiscuity and provided a numerical characterization of partner competition and level of interaction strength for about 28000 false-partner interactions. Finally, we demonstrated that binding site prediction is useful to discriminate native partners, but also to scale up the approach to thousands of protein interactions. This study was based on a large computational effort made by thousands of internet users helping the World Community Grid over a period of 7 months.

Data Integration

Work performed in the Data Integration axis this year has been dedicated to the design and implementation of a new approach to reduce the complexity of scientific workflow structures. More precisely, we focused on the presence of “anti-patterns” in the workflow structures, idiomatic structures that lead to over-complicated design. We have then proposed the DistilFlow method and a tool for automatically detecting such anti-patterns and replacing them with different patterns which result in a reduction in the workflow's overall structural complexity (BMC Journal paper accepted, published early 2014). This work has been performed in close collaboration with the Taverna group from the University of Manchester.

DistilFlow is part of J. Chen's thesis who has defended his PhD on October 11th, 2013 and is now back to China as a research assistant in Lanzhou University.

Systems Biology

Systems Biology includes the study of interaction networks such as gene regulatory, metabolic, or signaling networks. It involves both designing the topology of the networks and predicting their dynamic and spatiotemporal aspects. It requires the import of concepts from across various disciplines and crosstalk between theory, benchwork, modelling and simulation.

Topological analysis of metabolic networks

In we have developed a biclustering algorithm for elementary flux modes that is based on the Agglomeration of Common Motifs (ACoM). This allows a drastic diminution of the number of less significant fluxes and a kind of factorization of most important fluxes, yielding an algorithm running in quadratic time in the number of elementary flux modes.

We applied this algorithm to describe the decomposition into elementary flux modes of the central carbon metabolism in Bacillus subtilis and of the yeast mitochondrial energy metabolism. For Bacillus subtilis, a specific inhibition on the second domain of the lipoamide dehydrogenase (pdhD) component of pyruvate dehydrogenase complex that leads to the loss of all fluxes was exhibited . Such a conclusion is not predictable in the classical approach.

Evolution of metabolic networks

A collaboration with Igm on the evolution of metabolic networks is ungoing. We aim at understanding how such networks would emerge over time among the variety of species, and how these changes could be responsible for characteristic life traits. Our methodology to characterize the evolutionary origin of the enzymatic repertoire of different fungal groups relies on machine learning. Preliminary results were presented at Jobim 2013 .

Signaling networks

Our goal is to help the understanding of signaling pathways involving (GPCR) and to provide means to semi-automatically construct the signaling networks. Our method takes into account various kinds of biological experiments and their origin and automatically builds and draws the inferred network. Comparing the automatically deduced network with an already known fragment of the FSHR network allowed us to obtain new interesting hypotheses that are currently experimentally tested by biologists, our collaborators from Inra-Biosin Tours. In the next months, experimental data for some GPCR (FSH, 5HT2 et 5HT4) will be prepared by Bios and Igf (Montpellier), in the context of a GPCRnet ANR project.

Besides, in collaboration with K. Inoue, through the NII International Internship Program, we have studied the System Biology Graphical Notation language, a standard for expressing molecular networks, especially signaling networks, and proposed a translation of SBGN-AF into a logical formalism .

Modelling with Hsim

In a collaboration of P. Amar with microbiologists, the group of Marie-Joëlle Virolle from the Institut de Génétique et de Microbiologie, a first explicative model was proposed for the sigmoidicity of the shape of the survival curve of bacteria (S. lividans) having a antibiotic resistance gene, expressed at different levels, in presence of a constant concentration of antibiotics , , , .

This is particularly important since this method of inclusion of an antibiotics resistance gene to report the activity of its promoter is widely used in the streptomyces community.

Cancer and metabolism

It is shown in M. Behzadi's PhD thesis that most systems have very stable behaviours and that even large variations of their chemical characteristics do not affect the nature of the equilibria. This very general situation has been discovered by simulation but in some cases it is even possible to prove it mathematically.

Our collaborators M. Israël and L. Schwartz have listed more than a hundred tentative such bifurcations that we intend to study systematically. A preliminary study of the mitotic cycle with L. Paulevé has also put in evidence the strong influence of the pH of the cell on its capacity to duplicate. The PhD thesis of E. Bigan, co-directed by S. Daoudi (Univ. Denis Diderot) and J.-M. Steyaert investigates the generic properties of such complex systems and confirms that the ones we have already studied are not exceptions . Some prospective cases are studied in .

Partnerships and Cooperations Regional Initiatives

A. Denise is the coordinator of the "Japarin-3D" Digiteo project 2012-2016. This project, in collaboration with Prism at Versailles, aims to develop new efficient approaches for predicting the 3D structure of large RNA molecules, by applying game theory and graph algorithms.

National Initiatives ANR

A. Denise is involved in the NSD-NGD ANR project 2010-2014. Y. Ponty is involved in the Magnum ANR project (BLAN program, 12/2010–12/2014).

PEPS

Ch. Froidevaux was responsible for the CNRS-INSERM-Inria Peps grant Identification of metabolic capabilities of fungi by comparative genomic involving Igm, Paris-Sud and UMR GV, CNRS.

European Initiatives

Program: Partenariat Hubert Curien (PHC) Procope (Jointly funded by Egide and DAAD)

Project acronym: SOSW

Project title: Sharing and Optimizing Scientific Workflows

Duration: 2013 - 2015

Coordinator: Sarah Cohen-Boulakia

International Partner

U. Humboldt (Berlin, Allemagne)

Institute for Computer Science

Ulf Leser

Abstract : Considerable effort has been put into the development of scientific workflow management systems. They support scientists in developing, running, and monitoring chains of data analysis programs. A variety of systems have reached a level of maturity that allows them to be used by scientists for their bioinformatics experiments, especially including analysis of NGS data. However, each scientific group has its own way of analyzing NGS data, using a particular set of tools, in a particular order. The aim of this project is to exploit the complementary skills of the two European groups involved to develop approaches promoting exchange of (optimized) workflows.

International Initiatives Inria Associate Teams ITSNAP

Title: Intelligent Techniques for Structure of Nucleic Acids and Proteins

Inria principal investigator: Julie Bernauer

International Partner (Institution - Laboratory - Researcher):

Stanford University (United States) - Computational Structural Biology, School of Medicine, Structural Biology - Julie Bernauer

Duration: 2012 - 2014

See also: http://www.lix.polytechnique.fr/~bernauer/EA_ITSNAP/

The ITSNAP Associated Team project is dedicated to the computational study of RNA 3D structure and interactions. By developing new molecular hierarchical models for knowledge-based and machine learning techniques, we can provide new insights on the biologically important structural features of RNA and its dynamics. This knowledge of RNA molecules is key in understanding and predicting the function of current and future therapeutic targets.

Inria International Partners Declared Inria International Partners

CARNAGE

Program: Inria-Russia

Title: CARNAGE: Combinatorics of Assembly and RNA in GEnomes

Inria principal investigator: Mireille Régnier

International Partner (Institution - Laboratory - Researcher):

State Research Institute of Genetics and Selection of Industrial Microorganisms (Russia (Russian Federation)) - Bioinformatics laboratory - Mireille Régnier

Duration: 2012- 2014

See also: https://team.inria.fr/amib/carnage

CARNAGE addresses two main issues on genomic sequences, by combinatorial methods.

Fast development of high throughput technologies has generated a new challenge for computational biology. The recently appeared competing technologies each promise dramatic breakthroughs in both biology and medicine. At the same time the main bottlenecks in applications are the computational analysis of experimental data. The sheer amount of this data as well as the throughput of the experimental dataflow represent a serious challenge to hardware and especially software. We aim at bridging some gaps between the new "next generation"sequencing technologies, and the current state of the art in computational techniques for whole genome comparison. Our focus is on combinatorial analysis for NGS data assembly, interspecies chromosomal comparison, and definition of standard pipelines for routine large scale comparison.

This project also addresses combinatorics of RNA and the prediction of RNA structures, with their possible interactions.

Informal International Partners

Polytechnique/UPSud and McGill/U. Montréal

Program: CFQCU

Title: Réseau franco-québecois de recherche sur l'ARN

Inria principal investigator: Jean-Marc Steyaert

International Partner (Institution - Laboratory - Researcher):

Mc Gill and Université de Montréal (Canada)

Computer Science Department

Jérôme Waldispühl

Duration: 2012 - 2014

Résumé : The partners have developped complementary expertise on RNA : bioinformatics, combinatorics and algorithms. machine learning, physics and genomics. Methodologies will be developed that combine theoretical simulations and new (high throughput) experimental data. A common high level training at Master and PhD level is organized.

Inria International Labs

R. Fonseca spent 5 months at SLAC in Stanford to work with Henry van den Bedem. J. Bernauer spent two weeks at SLAC. The associated team members also presented their work at the Inria BIS 2013 Workshop in Stanford https://project.inria.fr/inria-siliconvalley/workshops/bis2013/.

Participation In other International Programs NII International Internship Program

Adrien Rougny has been an intern at Nii from February to August 2013 with a support of "Nii International Internship Program. He worked on the topic "Inference and Learning for Systems Biology and Network Dynamics" in Pr. Katsumi Inoue's group, a long-term collaboration of Ch. Froidevaux.

PHC Procore

J. Bernauer is coordinator with Pr. X. Huang at the Hong-Kong University of Science and Technology of a Partenariat Hubert Curien (PHC) Procore project (2012-2013). The project is entitled Computational studies of conformational dynamics of the RNA-induced silencing complex and design of miRNAs to target oncogenes.

International Research Visitors Visits of International Scientists

H.K. Hwang

Subject: Probabilistic Analysis of A Simple Evolutionary Algorithm

Institution:Taipeh University (Taiwan)

V. Reinharz

Subject: RNA 3D structure analysis

Institution: McGill University (Canada)

E. Furletova

Subject: word enumeration

Institution: Institute of Mathematical Problems in Biology (Russia)

Internships

C. Moutet (May and June 2013)

Subject: Poor mappability regions in assembly

Institution: ENS Lyon and Ecole Polytechnique Fédérale de Lausanne

Funding: Inria

Supervision: M. Régnier

F. Pirot (May and June 2013)

Subject: Exceptional words in Archae genomes

Institution: ENS Lyon

Funding: Inria

Supervision: M. Régnier

B. Fang (May to July 2013)

Subject: Clumps combinatorics, automata and word asymptotics

Institution: Princeton University (United States)

Funding: Ecole Polytechnique

Supervision: M. Régnier

J. Moussu (April to July 2013)

Subject: Repeats in genomic sequences

Institution: Rennes University

Funding: Inria

Supervision: M. Régnier

M. Pichene (April to July 2013)

Subject: Graph algorithms and protein-protein interactions

Institution: Paris-Sud University

Funding: Inria

Supervision: J. Bernauer

L. Uroshlev (June 2013)

Subject: Reference state for RNA KB potentials

Institution: IOGEN (Moscou, (Russia))

Funding: Inria (CARNAGE)

Supervision: J. Bernauer

O. Berillo (January and december 2013)

Subject: miRNAs and oncogenes.

Institution: El Farabi University (Almaty, (Kazakhstan))

Funding: El Farabi University

Supervision: M. Régnier

A. Bari (March 2013)

Subject: stress-inducible miRNAs

Institution:El Farabi University (Almaty, (Kazakhstan))

Funding: El Farabi University

Supervision: M. Régnier

Visits to International Teams

Sep. 2013–Sep. 2014: Y. Ponty is visiting PIMS and Simon Fraser University (Vancouver, Canada)

Dissemination Scientific Animation French Community Patrick Amar Jérôme Azé Julie Bernauer Sarah Cohen-Boulakia Alain Denise Christine Froidevaux Sabine Pérès Yann Ponty Mireille Régnier Jean-Marc Steyaert

The whole team is involved in GDR-Bim (Molecular Bioinformatics, http://www.gdr-bim.u-psud.fr/). J. Azé is the webmaster. A. Denise is a member of the Scientific Committee. Y. Ponty is animator of the Structure et interactions des macromolécules scientific axis. C. Froidevaux and S. Cohen-Boulakia participate to the subdomain Knowledge Representation, Ontologies, Data Integration and Grids.

A. Denise, Y. Ponty and M. Régnier participate into the subdomain Sequence Analysis and to Comatege subgroup of GDR- Im (Informatique Mathématique, http://www.gdr-im.fr/)

A. Denise, Y. Ponty, J.-M. Steyaert, and M. Régnier are involved in the Alea working group (http://igm.univ-mlv.fr/~nicaud/webalea/ of the GDR-Im (Informatique Mathématique, http://www.gdr-im.fr/).

Seminars and visits Amib seminars

We received in our weekly seminar: D. Saakian (A. Sinica, Taiwan), V. Reinharz (McGill), L. Tchertanov (ENS Cachan), H. Babou (Nantes), P. Ballarini (Ecp), Nicolas Ferey (Limsi), Van Du TRAN Thong (Igm), Ulf Leser (Humboldt U.), A. Zinoviev (Institut Curie, Paris), H.K. Hwang (Taipeh U.).

Other seminars

J. Bernauer presented her works at International Conference on Biomolecular Dynamics: Experiment Meet Computation, KAUST, Saudi Arabia.

R. Fonseca gave a talk at the Inria@SiliconValley Workshop Bis2013 in May in Stanford.

D. Iakovishina gave two talks at Institut Curie : at the weekly seminar of NGSand during the “Structural variants day” in december 2013.

International exchanges

J. Bernauer visited H. van den Bedem at SSRL (Slac) and M. Levitt at Stanford University (USA). She visited the Huang group at Hkust (Hong-Kong).

M. Régnier and D. Iakovishina visited IoGene (Moscow).

Program Committee

P. Amar was a member of the steering committee and chair of the organizing committee for aSSB workshop, advances in Systems and Synthetic Biology, Nice (2013).

J. Bernauer was a member of ICBD 2013 program committee.

S. Cohen-Boulakia was member of the DILS 2013, SWEET 2013 and BDA 2013 program committees and she is member of the editorial board of the Journal on Data Semantics.

Ch. Froidevaux is a member of the editorial board of 1024, Bulletin de la Société Informatique de France, SIF.

Y. Ponty, M. Régnier, and J.-M. Steyaert served as PC members for Bicob 2013 (5th International Conference on Bioinformatics and Computational Biology, Honolulu, USA).

Y. Ponty served as PC member for Ismb/Eccb 2013 (21st International conference on Intelligent Systems for Molecular Biology/12th European Conference on Computational Biology).

M. Régnier co-organized Mccmb'13 http://mccmb.belozersky.msu.ru/2013/.

F. d'Alché-Buc, Ch. Froidevaux and Y. Ponty were members of Jobim 2013 program committee.

J. Bernauer organized Iamb workshop (Integrative Approaches for Modeling Biomolecular Complexes) in Nice in collaboration with McGill University (Canada) and Nice University.

A one day meeting on Cancer and Metabolism was organized at Lix by J.-M. Steyaert on October 4th.

Research administration

J. Bernauer is member of the IDEX Paris - Saclay Groupe de travail Sciences du Vivant.

J. Bernauer and C. Froidevaux are member of the Comité de Pilotage of the IDEX Paris - Saclay Institut transverse de Modélisation des Sciences du Vivant.

A. Denise is a member of the Scientific Commission of the Inria-Saclay research center. He is deputy director of the computer science department at University Paris-Sud. He is member of the Academic Senate of the Paris-Saclay University.

Ch. Froidevaux is the head of the Bioinfo group at Lri. She was a member of a hiring committee for a Full Professor position at Polytech Paris Sud, Orsay.

Y. Ponty is an elected member of the Comité national du CNRS (6th section – Foundations of Computer Science and CID 51 –Bioinformatics).

M. Régnier is a deputy-member of Digiteo program committee.

J.-M.Steyaert is a member of the Board of Administrators of Polytechnique.

J.-M. Steyaert has contributed to the organization of a workshop in July 2013 to present currently running projects between AP-HP and Polytechnique. He serves in the selection committee of a MD from HP-HP for a yearly funded research position in the Polytechnique Research Center.

Teaching - Supervision - Juries Teaching

We have and we will go on having trained a group of good multi-disciplinary students both at the Master and PhD level. Being part of this community as a serious training group is obviously an asset. Our project is also very much involved in two major student programs in France: the Master BIBS (Bioinformatique et Biostatistique) at Université Paris-Sud/École Polytechnique and the parcours d'Approfondissement en Bioinformatique at École Polytechnique. We are also involved in a student partnership with McGill University (partenariat France Quebec offering French and Canadian students co-supervised internships (short term -3 to 6 months- or long term -part of the PhD studies-). J.-M. Steyaert is involved in the development of an interdisciplinary cooperation between Polytechnique and AP-HP that will favor interships of Polytechnicians and Masters students in AP-HP operational services.

Ch. Froidevaux is a member of the Scientific Committee of the Computer Science Doctoral School of Paris-Sud University.

J.-M. Steyaert organizes Bibs (M1 and M2) at Ecole Polytechnique. Ch. Froidevaux is co-heading the Master (M1 and M2) at the University Paris Sud. Most team members are teaching in this master.

J.Bernauer was appointed Chargé d'enseignement in the Computer Science Department of École Polytechnique (DIX) in 2013.

Master Bibs: J. Bernauer, Informatique théorique et Programmation Python, 20h, M2, Université Paris-Sud, France

Cycle Ingénieur Polytechnicien: J. Bernauer, Modal Bioinformatique, 18h, 2ème année, École Polytechnique, France

Cycle Ingénieur Polytechnicien: J. Bernauer, Algorithmes et Programmation INF421, 36h, 2ème année, Ecole Polytechnique, France

Cycle Ingénieur Polytechnicien: J. Bernauer, Modal Web Tablette INF441a, 36h, 2ème année, Ecole Polytechnique, France

Cycle Ingénieur Agro Paris Tech: J. Bernauer, Module AAB, cours invité, 3ème année, Agro Paris Tech, France

Master Bibs: Y. Ponty, M. Regnier, J.-M. Steyaert, Combinatoire, Algorithmes, Séquences et Modélisation (Casm), 32h, M2, Université Paris-Sud, France

Master : J.-M. Steyaert, X cycle ingénieur INF582- Datamining, 35h, M1, Ecole Polytechnique, France

Master : J.-M. Steyaert, X cycle ingénieur BioINF588- Algorithms for Bioinformatics, 35h, M1, Ecole Polytechnique, France

Licence : J.-M. Steyaert, X cycle ingénieur Modal-BioInformatique, 45h, L3, Ecole Polytechnique, France

Master : J.-M. Steyaert, Bibs Algorithmique avancée et optimisation, 25h, M2, X-Orsay, M2, Ecole Polytechnique, France

Data Bases, 48h, M1 Bibs (Bioinformatics and BioStatistics), Paris-Sud University, France (C. Froidevaux)

Advanced Algorithmics, 48h, M1 Bibs (Bioinformatics and BioStatistics), Paris-Sud University, France (C. Froidevaux)

Integration and Analysis of heterogeneous data from the Web, 24h, M2 Bibs (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (J. Azé, S. Cohen Boulakia, C. Froidevaux)

Advanced Data Bases and Data Mining, 42h, M2 Bibs (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (S. Cohen Boulakia, C. Froidevaux).

Initiation to Research, 6h, M2 Bibs (Bioinformatics and BioStatistics), Paris-Sud University, France (C. Froidevaux)

Software Engineering for Bioinformatics, 48h, M2 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (P. Amar)

Modelling and Simulation of Biological Processes, 24h, M2 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (P. Amar)

Biological Networks and Systems Biology, 9h, M1 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (P. Amar)

RNAomics and RNA Bioinformatics, 12h, M2 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (A.Denise)

Theoretical Computer Science, 30h, M2 BIBS (Bioinformatics and BioStatistics), Paris-Sud University/École Polytechnique, France (A.Denise)

Supervision

HdR : Patrick Amar, Contributions à l'étude de la dynamique des systèmes biologiques et aux systèmes de calcul en biologie synthétique, Paris Sud University, 19/12/2013

PhD : Jiuqiang Chen, Designing scientific workflows following a structure and provenance -aware strategy, Université Paris Sud, Defended on 10/10/2013, S. Cohen-Boulakia and C. Froidevaux.

PhD in progress : Mélanie Boudard, Game theory and stochastic learning for predicting the three-dimensional structure of large RNA molecules , 15/10/2012, D. Barth (Univ. Versailles), J. Cohen (CNRS, LRI) and A. Denise.

PhD in progress : Marc Bouffard, Étude de circuits logiques moléculaires et détection de portes logiques dans un réseau métabolique, Université Paris Sud, 01/10/2013, P. Amar and F. Molina.

PhD in progress : Bryan Brancotte, Ranking biological and biomedical data: algorithms and applications, Université Paris Sud, 01/10/2012, S. Cohen-Boulakia and A. Denise.

PhD in progress: Adrien Guilhot-Gaudeffroy, Modelling and scoring of protein-RNA complexes, 01/10/2011, J. Azé, J. Bernauer, C. Froidevaux.

PhD in progress: Daria Iakovishina, A Combinatorial Approach to Assembly Algorithms, 01/11/2011, M. Régnier.

PhD in progress : Adrien Rougny, Raisonnements sur des connaissances biologiques pour la construction et l'analyse des réseaux de signalisation, 01/10/2013, C. Froidevaux.

PhD in progress : Antoine Soulé, Evolutionary study of RNA-RNA interactions in yeast, 01/09/2013, J.-M. Steyaert and J. Waldispühl (University McGill, Canada).

PhD in progress : Bo Yang, Bioinformatics approaches for studying the relations between RNA structure and pre-messenger RNA splicing, 01/10/2011, A. Denise and Fu Xiangdong (Wuhan University, China)

PhD in progress : Cong Zeng, Identification of structural motifs in messenger RNAs, 01/10/2011, A. Denise

Juries

HDR

Ch. Froidevaux was a reviewer for an HDR (Montpellier).

J.-M. Steyaert served as a jury member for Hubert Lincet HDR defence (Caen).

PhD

P. Amar served as a referee for Laurent Crepin's PhD defence (Brest University).

Ch. Froidevaux served as a referee for a PhD thesis in Rennes and was a member of the committee for J. Leblay.

M. Régnier served as a referee for O. Abdou Arbi's PhD defence (Rennes University).

Funding agencies

ANR 2012-2013, SIMI2, J. Bernauer and S. Cohen-Boulakia

UEFISCDI 2011-2013 (Research Council Romania), Y. Ponty

Selection committees

Cnrs CR/DR: comité national (Section 6 and CID 51), Y. Ponty.

Inria CR2/CR1 committee: Saclay, J. Bernauer;

Maitre de conférence: Paris-Sud, Computer science department, S. Cohen-Boulakia and J. Azé.

Maitre de conférence: Bordeaux I, A. Denise.

Ingénieur de recherche: LIP6 UPMC (Paris), Y. Ponty.

Chargés d'enseignement et Professeur : Ecole Polytechnique, M. Régnier et J.-M. Steyaert.

Popularization

Outreach seminar at Lycée Blaise Pascal (Orsay, France) – Yann Ponty – Popular science seminar (2h), jointly organised by Inria (Saclay) and Académie de Versailles.

Unite ou café, Inria Saclay Popularization seminar, Les briques de construction de la vie , see: https://intranet.saclay.inria.fr/vie-du-centre/unithe-cafe/rencontres-2013/briques-construction-vie.

We also had the opportunity to be part of a few valorization related events. RNA structural studies were presented at the Rencontres Inria Industrie - Modélisation, simulation et calcul intensif in June 2013 http://www.inria.fr/centre/saclay/innovation/rii-modelisation-simulation-calcul-intensif/presentation. This led to an invitation at Sanofi Pharmacometry and Bioinformatics day in December 2013.

PDiffView: Viewing the Difference in Provenance of Workflow Results Zhuowei Bao Z. Sarah Cohen-Boulakia S. Susan Davidson S. Pierrick Girard P. PVLDB, Proc. of the 35th Int. Conf. on Very Large Data Bases 2 2 2009 1638-1641 Fully differentiable coarse-grained and all-atom knowledge-based potentials for RNA structure evaluation. Julie Bernauer J. Xuhui Huang X. Adelene Y L Sim A. Y. L. Michael Levitt M. RNA 17 6 June 2011 1066-75 http://hal.inria.fr/inria-00624999 Controlled non uniform random generation of decomposable structures Alain Denise A. Yann Ponty Y. Michel Termier M. Journal of Theoretical Computer Science (TCS) 411 40-42 2010 3527-3552 http://hal.inria.fr/hal-00483581/en Protein-protein interactions in a crowded environment: an analysis via cross-docking simulations and evolutionary information Anne Lopes A. Sophie Sacquin-Mora S. Viktoriya Dimitrova V. Elodie Laine E. Yann Ponty Y. Alessandra Carbone A. PLoS Computational Biology 9 12 December 2013 http://hal.inria.fr/hal-00875116 Counting RNA pseudoknotted structures Cédric Saule C. Mireille Regnier M. Jean-Marc Steyaert J.-M. Alain Denise A. Journal of Computational Biology 18 10 October 2011 1339-1351 http://hal.inria.fr/inria-00537117 avances in Systems and Synthetic Biology Patrick Amar P. François Képès F. Vic Norris V. EDP Sciences March 2013 171 http://hal.inria.fr/hal-00930249 Designing scientific workflows following a structure and provenance-aware strategy Jiuqiang Chen J. Université Paris Sud - Paris XI October 2013 http://hal.inria.fr/tel-00931122 Ph. D. Thesis Characteristics of binding sites of intergenic, intronic and exonic miRNAs with mRNAs of oncogenes coding intronic miRNAs Olga Berillo O. Assel Issabekova A. Mireille Regnier M. Anatoly Ivashchenko A. 1684-5315 African Journal of Biotechnology 12 10 March 2013 1016-1024 http://hal.inria.fr/hal-00825020 Binding of intronic miRNAs to the mRNAs of host genes encoding intronic miRNAs and proteins that participate in tumourigenesis Olga Berillo O. Mireille Regnier M. Anatoly Ivashchenko A. 0010-4825 Computers in Biology and Medicine July 2013 http://hal.inria.fr/hal-00850103 Distilling structure in Taverna scientific workflows: a refactoring approach Sarah Cohen-Boulakia S. Jiuqiang Chen J. Paolo Missier P. Carole Goble C. Alan Williams A. Christine Froidevaux C. 1471-2105 BMC Bioinformatics 15 Suppl 1 2014 S12 http://hal.inria.fr/hal-00926827 Optimisation problems for pairwise RNA sequence and structure comparison: a brief survey Alain Denise A. Philippe Rinaudo P. 2190-9288 Transactions on Computational Collective Intelligence 2013 http://hal.inria.fr/hal-00759573 to appear An Algorithmic Game-Theory Approach for Coarse-Grain Prediction of RNA 3D Structure. Alexis Lamiable A. Franck Quessette F. Sandrine Vial S. Dominique Barth D. Alain Denise A. 1545-5963 IEEE/ACM Transactions on Computational Biology and Bioinformatics 10 1 2013 193-9 http://hal.inria.fr/hal-00832110 An algorithmic game-theory approach for coarse-grain prediction of RNA 3D structure Alexis Lamiable A. Franck Quessette F. Sandrine Vial S. Dominique Barth D. Alain Denise A. 1545-5963 IEEE/ACM Transactions on Computational Biology and Bioinformatics 10 1 2013 193-199 http://hal.inria.fr/hal-00756340 Metabolic Treatment of Cancer: Intermediate Results of a Prospective Case Series Schwartz Laurent S. Buhler Ludivine B. Icard Philippe I. Lincet Hubert L. Jean-Marc Steyaert J.-M. 0250-7005 Anticancer Research January 2014 http://hal.inria.fr/hal-00933725 OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks. Néhémy Lim N. Yasin Senbabaoglu Y. George Michailidis G. Florence D'Alché-Buc F. 1367-4803 Bioinformatics 29 11 June 2013 1416–1423 http://hal.inria.fr/hal-00819024 Protein-protein interactions in a crowded environment: an analysis via cross-docking simulations and evolutionary information Anne Lopes A. Sophie Sacquin-Mora S. Viktoriya Dimitrova V. Elodie Laine E. Yann Ponty Y. Alessandra Carbone A. 1553-734X PLoS Computational Biology 9 12 December 2013 http://hal.inria.fr/hal-00875116 Non-redundant random generation algorithms for weighted context-free languages Andy Lorenz A. Yann Ponty Y. 0304-3975 Theoretical Computer Science 502 September 2013 177-194 http://hal.inria.fr/inria-00607745 Sensor potency of the moonlighting enzyme-decorated cytoskeleton Vic Norris V. Patrick Amar P. Guillaume Legent G. Camille Ripoll C. Michel Thellier M. Judit Ovadi J. 1471-2091 BMC Biochemistry 14 3 February 2013 http://hal.inria.fr/hal-00766058 A new dichotomic algorithm for the uniform random generation of words in regular languages (journal version) Johan Oudinet J. Alain Denise A. Marie-Claude Gaudel M.-C. 0304-3975 Theoretical Computer Science 502 September 2013 165-176 http://hal.inria.fr/hal-00716558 Elementary flux modes analysis of functional domain networks allows a better metabolic pathway interpretation Sabine Pérès S. Liza Felicori L. Franck Molina F. 1932-6203 PLoS ONE 2013 http://hal.inria.fr/hal-00861577 Large deviation properties for patterns Mireille Regnier M. Jérémie Bourdon J. 1570-8667 Journal of Discrete Algorithms September 2013 http://hal.inria.fr/hal-00868462 A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution. Vladimir Reinharz V. Yann Ponty Y. Jérôme Waldispühl J. 1367-4803 Bioinformatics 29 13 July 2013 i308-15 http://hal.inria.fr/hal-00840260 Extended version of ISMB/ECCB'13 Using Structural and Evolutionary Information to Detect and Correct Pyrosequencing Errors in Noncoding RNAs. Vladimir Reinharz V. Yann Ponty Y. Jérôme Waldispühl J. 1066-5277 Journal of Computational Biology 20 11 November 2013 905-19 http://hal.inria.fr/hal-00828062 Extended version of RECOMB'13 Introduction to RNA Secondary Structure Comparison Stefanie Schirmer S. Yann Ponty Y. Robert Giegerich R. Jan Gorodkin J. Walter L. Ruzzo W. L. RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods Methods in molecular biology 1097 Springer 2014 http://hal.inria.fr/hal-00846818 Novel insights regarding the sigmoidal pattern of resistance to neomycin conferred by the aphII gene, in Streptomyces lividans. Nicolas Seghezzi N. Marie-Joelle Virolle M.-J. Patrick Amar P. 2191-0855 AMB Express 3 1 February 2013 13 http://hal.inria.fr/hal-00794555 Supersecondary structure prediction of transmembrane beta-barrel proteins. Van Du T Tran V. D. T. Philippe Chassignet P. Jean-Marc Steyaert J.-M. 1064-3745 Methods in Molecular Biology -Clifton then Totowa- 932 2013 277-94 http://hal.inria.fr/hal-00761759 On permuted super-secondary structures of transmembrane <formula type="inline"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mi>β</mi></math></formula>-barrel proteins Thuong Van Du Tran T. V. D. Philippe Chassignet P. Jean-Marc Steyaert J.-M. 0304-3975 Theoretical Computer Science 2014 http://hal.inria.fr/hal-00869141 SPARCS: a web server to analyze (un)structured regions in coding RNA sequences. Yang Zhang Y. Yann Ponty Y. Mathieu Blanchette M. Eric Lecuyer E. Jérôme Waldispühl J. 0305-1048 Nucleic Acids Research 41 July 2013 W480-5 http://hal.inria.fr/hal-00819017 Identification de complexes protéine-protéine par combinaison de classifieurs. Application à Escherichia coli Thomas Bourquard T. Damien De Vienne D. Jérôme Azé J. Djamel A. Zighed D. A. Gilles Venturini G. EGC 2013 - 13eme conférence Francophone sur l'Extraction et la Gestion des Connaissances Toulouse, France RNTI E.24 Hermann Florence Sèdes (IRIT Toulouse - Université Paul Sabatier) et André Péninou (IRIT Toulouse - Université Toulouse 2) January 2013 419-430 http://hal.inria.fr/hal-00785473 Journées d'Extraction et Gestion des Connaissances 13 EGC Gene Regulatory Network Inference using ensembles of Local Multiple Kernel Models Arnaud Fouchet A. Jean-Marc Delosme J.-M. Florence D'Alché-Buc F. Seventh international workshop on Machine Learning in Systems Biology, satellite meeting of ISMB'2013 Berlin, Germany July 2013 http://hal.inria.fr/hal-00844494 ISMB International Workshop on Machine Learning in Systems Biology 7 Time-dependent gaussian process regression and significance analysis for sparse time-series Markus Heinonen M. Olivier Guipaud O. Fabien Milliat F. Valérie Buard V. Béatrice Micheau B. Florence D'Alché-Buc F. Seventh international workshop on Machine Learning in Systems Biology, satellite meeting of ISMB'2013 Berlin, Germany July 2013 http://hal.inria.fr/hal-00844474 ISMB International Workshop on Machine Learning in Systems Biology 7 Boosting an operator-valued kernel-based model for gene regulatory network inference Néhémy Lim N. Yasin Senbabaoglu Y. George Michailidis G. Florence D'Alché-Buc F. Workshop on Dynamics of biological networks: from nodes' dynamics to network evolution Edinburgh, United Kingdom June 2013 http://hal.inria.fr/hal-00844424 Workshop on Dynamics of Biological Networks: from Nodes' Dynamics to Network Evolution 2013 Nonparametric modeling for gene regulatory network inference using boosting and operator-valued kernels Néhémy Lim N. Yasin Senbabaoglu Y. George Michailidis G. Florence D'Alché-Buc F. Seventh International workshop on Machine Learning in Systems Biology, Satellite meeting of ISMB'2013 Berlin, Germany July 2013 http://hal.inria.fr/hal-00844443 ISMB International Workshop on Machine Learning in Systems Biology 7 Comparative analysis of phylogenetic profiles for the enzymatic characterization of fungal group Cécile Pereira C. Jérôme Azé J. Alain Denise A. Christine Drevet C. Christine Froidevaux C. Philippe Silar P. Olivier Lespinet O. JOBIM 2013 Toulouse, France 2013 http://hal.inria.fr/hal-00842021 Journées Ouvertes Biologie Informatique Mathématiques 14 JOBIM à paraître Clump Combinatorics, Automata, and Word Asymptotics Mireille Regnier M. Billy Fang B. Daria Iakovishina D. Michael Drmota M. Mark Ward M. ANALCO'14 Portland, United States SIAM January 2014 http://hal.inria.fr/hal-00864645 Workshop on Analytic Algorithms and Combinatorics 14 ANALCO A linear inside-outside algorithm for correcting sequencing errors in structured RNA sequences Vladimir Reinharz V. Yann Ponty Y. Jérôme Waldispühl J. RECOMB - 17th Annual International Conference on Research in Computational Molecular Biology - 2013 Beijing, China 2013 http://hal.inria.fr/hal-00766781 Annual International Conference on Research in Computational Molecular Biology 17 RECOMB A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotides distribution Vladimir Reinharz V. Yann Ponty Y. Jérôme Waldispühl J. ISMB/ECCB - 21st Annual international conference on Intelligent Systems for Molecular Biology/12th European Conference on Computational Biology - 2013 Berlin, Germany 2013 http://hal.inria.fr/hal-00811607 Joint International Conference on Intelligent Systems for Molecular Biology and European Conference on Computational Biology 2013 ISMB/ECCB Translating the SBGN-AF language into logic to analyze signalling networks Adrien Rougny A. Christine Froidevaux C. Yoshitaka Yamamoto Y. Katsumi Inoue K. Katsumi Inoue K. Chiaki Sakama C. LNMR - 1st International Workshop on Learning and Non Monotonic Reasoning La Coruña, Spain arXiv:1311.4639 CORR November 2013 44-55 http://hal.inria.fr/hal-00924230 International Workshop on Learning and Non Monotonic Reasoning 1 LNMR Using the Fast Fourier Transform to accelerate the computational search for RNA conformational switches (extended abstract) Evan Senter E. Saad Sheikh S. Ivan Dotu I. Yann Ponty Y. Peter Clote P. RECOMB - 17th Annual International Conference on Research in Computational Molecular Biology - 2013 Beijing, China 2013 http://hal.inria.fr/hal-00766780 Annual International Conference on Research in Computational Molecular Biology 17 RECOMB Flexible RNA design under structure and sequence constraints using formal languages Yu Zhou Y. Yann Ponty Y. Stéphane Vialette S. Jérôme Waldispühl J. Yi Zhang Y. Alain Denise A. ACM-BCB - ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics - 2013 Bethesda, Washigton DC, United States 2013 http://hal.inria.fr/hal-00823279 ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics 2013 ACM-BCB Contributions à l'étude de la dynamique des systèmes biologiques et aux systèmes de calcul en biologie synthétique Patrick Amar P. Université Paris Sud - Paris XI December 2013 http://hal.inria.fr/tel-00929785 Habilitation à Diriger des Recherches Semileptonic <formula type="inline"><math xmlns="http://www.w3.org/1998/Math/MathML" overflow="scroll"><mrow><mi>B</mi><mi>t</mi><mi>o</mi><mi>D</mi><mi>⌃</mi><mrow><mo>*</mo><mo>*</mo></mrow></mrow></math></formula> decays in Lattice QCD : a feasibility study and first results M. Atoui M. B. Blossier B. V. Morénas V. O. Pène O. K. Petrov K. 2013 28 http://hal.inria.fr/hal-00917799 Properties of Random Complex Chemical Reaction Networks and Their Relevance to Biological Toy Models Erwan Bigan E. Jean-Marc Steyaert J.-M. Stéphane Douady S. 2013 http://hal.inria.fr/hal-00859004 Pattern occurrences Pvalues, Hidden Markov Models and Overlap Graphs Mireille Regnier M. Eugenia Furletova E. Mikhail Roytberg M. Victor Yakovlev V. 2014 http://hal.inria.fr/hal-00858701 to appear Comparative study of some methods for simulation of biochemical reactions Patrick Amar P. Ecole de Printemps 2012 de la Société Francophone de Biologie Théorique Saint Flour, France June 2012 http://hal.inria.fr/hal-00763571 HSIM: an hybrid stochastic simulation system for systems biology Patrick Amar P. Loïc Paulevé L. The Third International Workshop on Static Analysis and Systems Biology (SASB 2012) Deauville, France September 2012 http://hal.inria.fr/hal-00758168 Towards a logic-based method to infer provenance-aware molecular networks Zahira Aslaoui-Errafi Z. Sarah Cohen-Boulakia S. Christine Froidevaux C. Pauline Gloaguen P. Anne Poupon A. Adrien Rougny A. Meriem Yahiaoui M. Proc. of the 1st ECML/PKDD International workshop on Learning and Discovery in Symbolic Systems Biology (LDSSB) Bristol, Royaume-Uni September 2012 103-110 http://hal.inria.fr/hal-00748041 Using Kendall-Tau Meta-Bagging to Improve Protein-Protein Docking Predictions Jérôme Azé J. Thomas Bourquard T. Sylvie Hamel S. Anne Poupon A. David Ritchie D. M. Loog M. et al. PRIB 2011 DELFT, Pays-Bas Marcel Reinders and Dick de Ridder 2011 284-295 http://hal.inria.fr/inria-00628038 DiMoVo: a Voronoi tessellation-based method for discriminating crystallographic and biological protein-protein interactions Julie Bernauer J. Ranjit Prasad Bahadur R. P. Francis Rodier F. Joël Janin J. Anne Poupon A. Bioinformatics 24 5 March 2008 652-8 http://hal.inria.fr/inria-00431696 Multi-Scale Modelling of Biosystems: from Molecular to Mesocale - Session Introduction. Julie Bernauer J. Samuel Flores S. Xuhui Huang X. Seokmin Shin S. Ruhong Zhou R. Pacific Symposium on Biocomputing 2011 177-80 http://hal.inria.fr/inria-00542791 Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data V. Boeva V. Tatiana Popova T. Kevin Bleakley K. Pierre Chiche P. Julie Cappo J. Gudrun Schleiermacher G. Isabelle Janoueix-Lerosey I. Olivier Delattre O. Emmanuel Barillot E. Bioinformatics 28 3 2012 423-425 Comparing Voronoi and Laguerre tessellations in the protein-protein docking context Thomas Bourquard T. Julie Bernauer J. Jérôme Azé J. Anne Poupon A. Sixth annual International Symposium on Voronoi Diagrams Copenhagen, Danemark F. Anton and J. Andreas Bærentzen - Technical University of Denmark June 2009 http://hal.inria.fr/inria-00429618 A collaborative filtering approach for protein-protein docking scoring functions Thomas Bourquard T. Julie Bernauer J. Jérôme Azé J. Anne Poupon A. PLoS ONE 6 4 2011 http://hal.inria.fr/inria-00625000 A kinematic view of loop closure Evangelos A. Coutsias E. A. Chaok Seok C. Matthew P. Jacobson M. P. Ken A. Dill K. A. J Comput Chem 25 4 Mar 2004 510–528 http://dx.doi.org/10.1002/jcc.10416 VARNA: Interactive drawing and editing of the RNA secondary structure Kévin Darty K. Alain Denise A. Yann Ponty Y. Bioinformatics 25 15 August 2009 1974-5 http://hal.inria.fr/hal-00432548 Coverage-biased random exploration of large models and application to testing Alain Denise A. Marie-Claude Gaudel M.-C. Sandrine-Dominique Gouraud S.-D. Richard Lassaigne R. Johan Oudinet J. Sylvain Peyronnet S. Software Tools for Technology Transfer (STTT) 14 1 2012 73-93 http://hal.inria.fr/inria-00560621 Controlled non uniform random generation of decomposable structures Alain Denise A. Yann Ponty Y. Michel Termier M. Theoretical Computer Science 411 40-42 2010 3527-3552 http://hal.inria.fr/hal-00483581 RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble Y. Ding Y. C.Y. Chan C. Ch. Lawrence C. RNA 11 2005 1157–1166 Community-Wide Assessment of Protein-Interface Modeling Suggests Improvements to Design Methodology. Sarel J Fleishman S. J. Timothy A Whitehead T. A. Eva-Maria Strauch E.-M. Jacob E Corn J. E. Sanbo Qin S. Huan-Xiang Zhou H.-X. Julie C Mitchell J. C. Omar N A Demerdash O. N. A. Mayuko Takeda-Shitaka M. Genki Terashi G. Iain H Moal I. H. Xiaofan Li X. Paul A Bates P. A. Martin Zacharias M. Hahnbeom Park H. Jun-Su Ko J.-S. Hasup Lee H. Chaok Seok C. Thomas Bourquard T. Julie Bernauer J. Anne Poupon A. Jérôme Azé J. Seren Soner S. Sefik Kerem Ovalı S. K. Pemra Ozbek P. Nir Ben Tal N. B. Türkan Haliloglu T. Howook Hwang H. Thom Vreven T. Brian G Pierce B. G. Zhiping Weng Z. Laura Pérez-Cano L. Carles Pons C. Juan Fernández-Recio J. Fan Jiang F. Feng Yang F. Xinqi Gong X. Libin Cao L. Xianjin Xu X. Bin Liu B. Panwen Wang P. Chunhua Li C. Cunxin Wang C. Charles H Robert C. H. Mainak Guharoy M. Shiyong Liu S. Yangyu Huang Y. Lin Li L. Dachuan Guo D. Ying Chen Y. Yi Xiao Y. Nir London N. Zohar Itzhaki Z. Ora Schueler-Furman O. Yuval Inbar Y. Vladimir Patapov V. Mati Cohen M. Gideon Schreiber G. Yuko Tsuchiya Y. Eiji Kanamori E. Daron M Standley D. M. Haruki Nakamura H. Kengo Kinoshita K. Camden M Driggers C. M. Robert G Hall R. G. Jessica L Morgan J. L. Victor L Hsu V. L. Jian Zhan J. Yuedong Yang Y. Yaoqi Zhou Y. Panagiotis L Kastritis P. L. Alexandre M J J Bonvin A. M. J. J. Weiyi Zhang W. Carlos J Camacho C. J. Krishna P Kilambi K. P. Aroop Sircar A. Jeffrey J Gray J. J. Masahito Ohue M. Nobuyuki Uchikoga N. Yuri Matsuzaki Y. Takashi Ishida T. Yutaka Akiyama Y. Raed Khashan R. Stephen Bush S. Denis Fouches D. Alexander Tropsha A. Juan Esquivel-Rodríguez J. Daisuke Kihara D. P Benjamin Stranges P. B. Ron Jacak R. Brian Kuhlman B. Sheng-You Huang S.-Y. Xiaoqin Zou X. Shoshana J Wodak S. J. Joel Janin J. David Baker D. Journal of Molecular Biology September 2011 http://hal.inria.fr/inria-00637848 in press Multiscale modeling of macromolecular biosystems Samuel C Flores S. C. Julie Bernauer J. Seokmin Shin S. Ruhong Zhou R. Xuhui Huang X. Briefings in Bioinformatics 13 4 July 2012 395-405 http://hal.inria.fr/hal-00684530 Apprentissage de fonctions de tri pour la prédiction d'interactions protéine-ARN Adrien Guilhot-Gaudreffroy A. Jérôme Azé J. Julie Bernauer J. Christine Froidevaux C. Extraction et Gestion des Connaissances Rennes, France accepted 2014 Exploration of uncharted regions of the protein universe Lukasz Jaroszewski L. Zhanwen Li Z. S Sri Krishna S. S. Constantina Bakolitsa C. John Wooley J. Ashley M. Deacon A. M. Ian A. Wilson I. A. Adam Godzik A. PLoS Biol 7 9 Sep 2009 http://dx.doi.org/10.1371/journal.pbio.1000205 Automated prediction of three-way junction topological families in RNA secondary structures Alexis Lamiable A. Dominique Barth D. Alain Denise A. Franck Quessette F. Sandrine Vial S. Eric Westhof E. Computational Biology and Chemistry 37 January 2012 1-5 http://hal.inria.fr/hal-00641738 A global sampling approach to designing and reengineering RNA secondary structures. Alex Levin A. Mieszko Lis M. Yann Ponty Y. Charles W O'Donnell C. W. Srinivas Devadas S. Bonnie Berger B. Jérôme Waldispühl J. Nucleic Acids Research 40 20 November 2012 10041-52 http://hal.inria.fr/hal-00733924 ESBTL: efficient PDB parser and data structure for the structural and geometric analysis of biological macromolecules. Sébastien Loriot S. Frederic Cazals F. Julie Bernauer J. Bioinformatics 26 8 April 2010 1127-8 http://hal.inria.fr/inria-00536404 Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Daniel J. Mandell D. J. Evangelos A. Coutsias E. A. Tanja Kortemme T. Nat Methods 6 8 Aug 2009 551–552 http://dx.doi.org/10.1038/nmeth0809-551 Kinematic manipulation of molecular chains subject to rigid constraints D. Manocha D. Y. Zhu Y. Proc Int Conf Intell Syst Mol Biol 2 1994 285–293 Conformational analysis of molecular chains using nano-kinematics D. Manocha D. Y. Zhu Y. W. Wright W. Comput Appl Biosci 11 1 Feb 1995 71–86 The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data M. Parisien M. F. Major F. Nature 452 7183 2008 51–55 Profile of Tries Gayun Park G. Hsien-Kuei Hwang H.-K. Pierre Nicodème P. Wojciech Szpankowski W. SIAM Journal on Computing 38 5 2009 1821-1880 http://hal.inria.fr/hal-00781400 Efficient sampling of RNA secondary structures from the Boltzmann ensemble of low-energy: The boustrophedon method Yann Ponty Y. Journal of Mathematical Biology 56 1-2 Jan 2008 107–127 http://www.lri.fr/~ponty/docs/Ponty-07-JMB-Boustrophedon.pdf GenRGenS: Software for Generating Random Genomic Sequences and Structures Y. Ponty Y. M. Termier M. A. Denise A. Bioinformatics 22 12 2006 1534–1535 http://hal.inria.fr/inria-00601060 ACoM: A classification method for elementary flux modes based on motif finding Sabine Pérès S. F. Vallée F. Marie Beurton-Aimar M. Jean-Pierre Mazat J.-P. BioSystems 103 3 2011 410-419 http://hal.inria.fr/hal-00642137 Using the Fast Fourier Transform to Accelerate the Computational Search for RNA Conformational Switches Evan Senter E. Saad Sheikh S. Ivan Dotu I. Yann Ponty Y. Peter Clote P. PLoS ONE 7 12 December 2012 http://hal.inria.fr/hal-00769740 Evaluating mixture models for building RNA knowledge-based potentials Adelene Y L Sim A. Y. L. Olivier Schwander O. Michael Levitt M. Julie Bernauer J. Journal of Bioinformatics and Computational Biology 10 2 April 2012 1241010 http://hal.inria.fr/hal-00757761 Modeling and predicting super-secondary structures of transmembrane beta-barrel proteins Thuong Van Du Tran T. V. D. Ecole Polytechnique X December 2011 http://hal.inria.fr/tel-00647947 THESE An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure Jérôme Waldispühl J. Yann Ponty Y. Journal of Computational Biology 18 11 November 2011 1465-79 http://hal.inria.fr/hal-00681928 Real-space protein-model completion: an inverse-kinematics approach Henry van den Bedem H. Itay Lotan I. Jean Claude Latombe J. C. Ashley M. Deacon A. M. Acta Crystallogr D Biol Crystallogr 61 Pt 1 Jan 2005 2–13 http://dx.doi.org/10.1107/S0907444904025697