The research domain of the bioinformatics Dyliss team is sequence analysis and systems biology. Our main goal in biology is to characterize groups of genetic actors that control the phenotypic answer of species when challenged by their environment. The team explores methods in the field of formal systems, more precisely in knowledge representation, constraints programming, multi-scale analysis of dynamical systems, and machine learning. Our goal is to identify key regulators of the environmental response by structuring and reasoning on information which combines physiological responses measured with omics technologies (RNA-seq, metabolomics, proteomics), genetic information from their long-distant cousins and knowledge about regulation and metabolic pathways stored in public repositories.
The main challenges we face are data incompleteness and heterogeneity. We favor the construction and study of a "space of feasible models or hypotheses" including known constraints and facts on a living system rather than searching for a single optimized model. We develop methods allowing a precise investigation of this space of hypotheses. Therefore, we are in position of developing experimental strategies to progressively shrink the space of hypotheses and gain in the understanding of the system. Importantly, one should notice that our models span a quite large spectrum of discrete structures: oriented graphs, boolean networks, automata, or expressive grammars.
More concretely, the steps of the analysis are to (i) formalize and integrate in a set of logical or grammatical constraints both generic knowledge information (literature-based regulatory pathways, diversity of molecular functions, DNA patterns associated with molecular mechanisms) and species-specific information (physiological response to perturbation, sequencing...); (ii) investigate the space of admissible models and exhibit its main features by solving combinatorial optimization problems; (iii) identify corresponding genomic products within sequences. At each of these steps, we rely on symbolic methods for model space exploration (ontologies and formal concepts analysis).
We target applications for which large-scale heterogeneous data about a specific but complex physiological phenotype are available. Existing long-term partnerships with biological labs give strong support to this choice. In marine biology, we collaborate closely with the Station biologique de Roscoff (Idealg, Investissement avenir "Bioressources et Biotechnologies"). We also collaborate with other teams of Inria in the IPL Algae In Silico project to understand the metabolism of a micro-algae. In environmental microbiology we collaborate both with the CRG in Chile in the framework of the Ciric Chilean Inria center (Ciric-Omics). In agriculture, our main partners are within the INRA institute in Rennes, with a focus on the understanding of pea-aphids reproduction mode and of breeding animals metabolism (porc, chicken, cow). More recently, we have introduced health as a new application field of the team, especially through the study of large-scale boolean networks and their confrontation with knowledge repositories (collaboration with Inserm, CHU Rennes and Sanofi).
Biological networks are built with data-driven approaches aiming at translating genomic information into a functional map. Most methods are based on a probabilistic framework which defines a probability distribution over the set of models. The reconstructed network is then defined as the most likely model given the data.
Our team has investigated an alternative perspective where each data induces a set of constraints – related to the steady state response of the system dynamics – on the set of possible values in a network of fixed topology. The methods that we have developed complete the network with product states at the level of nodes and influence types at the level of edges, able to globally explain experimental data. In other words, the selection of relevant information in the model is no more performed by selecting the network with the highest score, but rather by exploring the complete space of models satisfying constraints on the possible dynamics supported by prior knowledge and observations. In the (common) case when there is no model satisfying all the constraints, we relax the problem by introducing new combinatorial optimization problems that introduce the possibility of correcting the data or the knowledge. Common properties to all solutions are considered as a robust information about the system, as they are independent from the choice of a single solution to the optimization problem .
Solving these computational issues requires addressing NP-hard qualitative (non-temporal) issues. We have developed a long-term collaboration with Potsdam University in order to use a logical paradigm named Answer Set Programming (ASP) , to solve these constraint satisfiability and combinatorial optimization issues. Applied on transcriptomic or cancer networks, our methods identified which regions of a large-scale network shall be corrected , and proposed robust corrections . This result suggested that this approach was compatible with efficiency, scale and expressivity needed by biological systems.
During the last years, our goal was to provide formal models of queries on biological networks with the focus of integrating dynamical information as explicit logical constraints in the modeling process. Using these technologies requires to revisit and reformulate constraint-satisfiability problems at hand in order both to decrease the search space size in the grounding part of the process and to improve the exploration of this search space in the solving part of the process. Concretely, getting logical encoding for the optimization problems forces to clarify the roles and dependencies between parameters involved in the problem. This paves the way to a refinement approach based on a fine investigation of the space of hypotheses in order to make it smaller and gain in the understanding of the system. Our studies confirmed that logical paradigms are a powerful approach to build and query reconstructed biological systems, in complement to discriminative ("black-box") approaches based on statistical machine-learning. Based on these technologies, we have developed a panel of methods allowing the integration of muli-scale data knowledge, linking genomics, metabolomics, expression data and protein measurement of several phenotypes.
Notice that our main issue is in the field of knowledge representation. More precisely, we do not wish to develop new solvers or grounders, a self-contained computational issue which is addressed by specialized teams such as our collaborator team in Potsdam. Our goal is rather to investigate how the constant progresses in the field of constraint logical programming, shown by the performance of ASP-solvers, are sufficient to address the complexity of constraint-satisfiability and combinatorial optimization issues explored in systems biology. In this direction, we work in close interaction with Potsdam university to feed their research activities whith challenging issues from bioinformatics and, as a feed-back, take benefit of the prototypes they develop.
By exploring the complete space of models, our approach typically produces numerous candidate models compatible with the observations. We began investigating to what extent domain knowledge can further refine the analysis of the set of models by identifying classes of similar models, or by selecting a subset of models that satisfy an additional constraint (for instance, best fit with a set of experiments, or with a minimal size). We anticipate that this will be particularly relevant when studying non-model species for which little is known but valuable information from other species can be transposed or adapted. These efforts consist in developing reasoning methods based on ontologies as formal representation of symbolic knowledge. We use Semantic Web tools such as SPARQL for querying and integrating large sources of external knowledge, and measures of semantic similarity and particularity for analyzing data.
As explained below, Answer Set Programming technologies enable the identification of key controllers based on the integration of static data. As a natural follow-up, we also develop optimization techniques to learn models of the dynamics of a biological system. As before, our strategy is not to select a single model fitting with experimental data but rather to decipher the complete set of families of models which are compatible with the observed response. Our main research line in this field is to decipher the appropriate level of expressivity (in terms of constraints) allowing both to properly report the nature of data and knowledge and to allow for an exhaustive study of the space of feasible models. To implement this strategy, we rely on several constraint programming frameworks, which depend on the model scale and the nature of time-points kinetic measurements. Logical programming (Answer Set Programming) is used to decipher the combinatorics of synchrone Boolean networks explaining static or dynamics response of signaling networks to perturbations (such as measured by phosphoproteomics technologies) . SAT-based approaches are used to decipher the combinatorics of large-scale asynchronous boolean networks. In order to gain in expressivity, we model these networks as guarded-transition network, an extension of Petri nets ,. Finally, classical learning methods are used to build ad-hoc parameterized numerical models that provide the most parsimonious explanations to experimental measurements.
Once groups of genome products involved in the answer of the species have been identified with integrative or dynamical methods, it remains to characterize the biological actors within genomes. To that goal, we both learn, model and parse formal patterns within DNA, RNA or protein sequences. More precisely, our research on modeling biomolecular sequences with expressive formal grammars focuses on learning such grammars from examples, helping biologists to design their own grammar and providing practical parsing tools.
On the development of machine learning algorithms for the induction of grammatical models , we have a strong expertise on learning finite state automata. We have proposed an algorithm that learns successfully automata modeling families of (non homologous) functional families of proteins , leading to a tool named Protomata-learner. The algorthim is based on a similar fragment merging heuristic approach which reports partial and local alignments contained in a family of sequences. As an example, this tool allowed us to properly model the TNF protein family, a difficult task for classical probabilistic-based approaches. It was also applied successfully to model important enzymatic families of proteins in cyanobacteria . Our future goal is to further demonstrate the relevance of formal language modeling by addressing the question of a fully automatic prediction from the sequence of all the enzymatic families, aiming at improving even more the sensitivity and specificity of the models. As enzyme-substrate interactions are very specific central relations for integrated genome/metabolome studies and are characterized by faint signatures, we shall rely on models for active sites involved in cellular regulation or catalysis mechanisms. This requires to build models gathering both structural and sequence information in order to describe (potentially nested or crossing) long-term dependencies such as contacts of amino-acids that are far in the sequence but close in the 3D protein folding. Our current researches is focused on the inference of Context-Free Grammars including the topological information coming from the structural characterization of active sites.
Using context-free grammars instead of regular patterns increases the complexity of parsing issues. Indeed, efficient parsing tools have been developed to identify patterns within genomes but most of them are restricted to simple regular patterns. Definite Clause Grammars (DCG), a particular form of logical context-free grammars have been used in various works to model DNA sequence features . An extended formalism, String Variable Grammars (SVGs), introduces variables that can be associated to a string during a pattern search , . This increases the expressivity of the formalism towards mildly context sensitive grammars. Thus, those grammars model not only DNA/RNA sequence features but also structural features such as repeats, palindromes, stem/loop or pseudo-knots. A few years ago, we have designed a first tool, STAN (suffix-tree analyser), in order to make it possible to search for a subset of SVG patterns in full chromosome sequences . This tool was used for the recognition of transposable elements in Arabidopsis thaliana . We have enlarged this experience through a new modeling language, called Logol . Generally, a suitable language for the search of particular components in languages has to meet several needs : expressing existing structures in a compact way, using existing databases of motifs, helping the description of interacting components. In other words, the difficulty is to find a good tradeoff between expressivity and complexity to allow the specification of realistic models at genome scale. The Logol language and associated framework have been built in this direction. The Logol specificity besides other SVG-like languages mainly lies in a systematic introduction of constraints on string variables.
All the methods presented in the previous sections usually result in pools of candidates which equivalently explain the data and knowledge. These candidates can be dynamical systems, compounds, biological sequences, proteins... In any case, the output of our formal methods generally requires a posteriori investigation and filtering by domain experts. In order to assist them, we rely on two classes of symbolic technics: Semantic Web technologies and Formal Concept Analysis (FCA). They both aim at the formalization and management of knowledge, that is, the explicitation of relations occuring in structured data. These technics complement each other: the production of relevant concepts in FCA highly depends on the availability of semantic annotations using a controlled set of terms and conversely, building and exploiting ontologies is a complex process that can be made much easier with FCA.
Integrating heterogenous data with semantic web technologies The emergence of ontologies in biomedical informatics and bioinformatics happened in parallel with the development of the Semantic Web in the computer science community . Let us recall that the Semantic Web is an extension of the current Web that provides an infrastructure integrating data and ontologies in order to support unified reasoning. Since the beginning, life sciences have been a major application domain for the Semantic Web . This was motivated by the joint evolution of data acquisition capabilities in the biomedical field, and of the methods and infrastructures supporting data analysis (grids, the Internet...), resulting in an explosion of data production in complementary domains , . Consequently, Semantic Web technologies have become an integral part of translational medicine and translational bioinformatics . The Linked Open Data project promotes the integration of data sources in machine-processable formats compatible with the Semantic Web , with a strong involvement of life sciences in this initiative.
However, a specificity of life sciences “data deluge” is that the proportion of generated data is much higher than in the more general “big data phenomenon”, and that these data are highly connected . The bottleneck that once was data scarcity now lies in the lack of adequate methods supporting data integration, processing and analysis . Each of these steps typically hinges on domain knowledge, which is why they resist automation. This knowledge can be seen as the set of rules representing in what conditions data can be used or can be combined for inferring new data or new links between data.
In this setting, we are working on the integration of Semantic Web resources with our data analysis methods in order to take existing biological knowledge into account. We have introduced several methods to interpret semantic similarities and particularities , . We now focus our attention on the semi-automated construction of RDF abstractions of heterogeneous datasets which can be handled by non-expert users. This allows both to automatically prepare input datasets for the other methods developed in the team and to analyse the output of the methods in a wide knowledge context.
Using Formal concept analysis to explore the results of bioinformatics analyses Formal concept analysis aims at the development of conceptual structures which can be logically activated for the formation of judgments and conclusions . It is used in various domains managing structured data such as knowledge processing, information retrieval or classification . In its most simple form, one considers a binary relation between a set of objects and a set of attributes. In this setting, formal concept analysis formalizes the semantic notions of extension and intension. Concepts are related within a lattice structure (Galois connection) by subconcept-superconcept relations, and this allows drawing causality relations between attribute subsets. In bioinformatics, it has been used to derive phylogenetic relations among groups of organisms , a classification task that requires to take into account many-valued Galois connections. We have proposed in a similar way a classification scheme for the problem of protein assignment in a set of protein families .
One of the most important issue with concept analysis is due to the fact that current methods remain very sensitive to the presence of uncertainty or incompleteness in data. On the other hand, this apparent defect can be reversed to serve as a marker of incompleteness or inconsistency . Following this inspiration, we have proposed a methodology to tackle the problem of uncertainty on biological networks where edges are mostly predicted links with a high level of false positives . The general idea consists to look for a tradeoff between the simplicity of the conceptual representation and the need to manage exceptions. As a very prospective challenge, we are exploring the idea of using ontologies to help this or to help ontology refinement using concept analysis , , .
More generally, common difficult tasks in this context are visualization, search for local structures (graph mining) and network comparison. Network compression is a good solution for an efficient treatment of all these tasks. This has been used with success in power graphs, which are abstract graphs where nodes are clusters of nodes in the initial graph and edges represent bicliques between two sets of nodes . In fact, concepts are maximal bicliques and we are currently developing the power graph idea in the framework of concept analysis.
Seven platforms have been developped in the team for the last five years: Askomics, AuReMe, FinGoc, Caspo, Cadbiom, Logol, Protomata. Indeed, one of the team's goals is to facilitate interplays between the tools for biological data analysis and integration. Improvements and novelties of these platforms are described in the "software" section. Our platforms aim at guiding the user to progressively reduce the space of models (families of sequences of genes or proteins, families of keys actors involved in a system response, dynamical models) which are compatible with both knowledge and experimental observations.
Most of our platforms are developed with the support of the GenOuest resource and data center hosted in the IRISA laboratory, including their computer facilities [more info]. It worths considering them into larger dedicated environments to benefit from the expertise of other research groups. The BioShadock repository of the GenOuest platform allows one to share the different docker containers that we are developing [website]. The GenOuest galaxy portal of the GenOuest platform now provides access to most tools for integrative biology and sequence annotation (access on demand).
Goal Integration and interrogation software for linked biological data based on semantic web technologies [url].
DescriptionAskOmics aims at bridging the gap between end user data and the Linked (Open) Data cloud. It allows heterogeneous bioinformatics data (formatted as tabular files or directly in RDF) to be loaded into a Triple Store system using a user-friendly web interface. AskOmics also provides an intuitive graph-based user interface supporting the creation of complex queries that currently require hours of manual searches across tens of spreadsheet files. The elements of interest selected in the graph are then automatically converted into a SPARQL query that is executed on the users's data.
Originality Our experience is that end users (i) do not benefit for all the information available in the LOD cloud repositories by lack of SPARQL expertise (understandably: they are biologists and most of them do not have an interest in either learning SPARQL nor in learning how to integrate data); (ii) do not contribute their data back to the LOD cloud. Again, they do not have the expertise nor the resources to produce and maintain datasets and the associated metadata as linked data, nor to maintain the underlying server infrastructure. Therefore there is a need for helping end users to (1) take advantage of the information readily available in the LOD cloud for analyzing there own data and (2) contribute back to the linked data by representing their data and the associated metadata in the proper format as well as by linking them to other resources. In this context, the main originality is the graphical interface that allows any SPARQL query to be built transparently and iteratively by a non-expert user.
Application This software was developed in the context of the MirnAdapt (pea-aphid) project in 2016. The tool has been presented to the agriculture communities in conferences , and to the Galaxy community . Up to now, more than 10 biological partners team are actually testing and using the prototype software (colza, pea-aphids, copper microbiology, marine biology), and Sanofi has shown its interest to co-develop the tool. Even if its current user base belongs to the bioinformatics community, the scope of AskOmics is domain-independent and has the potential to reach a wider audience related to the Semantic Web community.
Goal Tracable reconstruction of metabolic networks [url].
Description The toolbox AuReMe allows for the Automatic Reconstruction of Metabolic networks based on the combination of multiple heterogeneous data and knowledge sources . It is available as a Docker image. Five modules are composing AuReMe: 1) The Model-management PADmet module allows manipulating and traceing all metabolic data via a local database. [package] 2) The meneco python package allows the gaps of a metabolic network to be filled by using a topological approach that implements a logical programming approach to solve a combinatorial problem , and [python package] 3) The shogen python package allows genome and metabolic network to be aligned in order to identify genome units which contain a large density of genes coding for enzymes; it also implements a logical programming approach [python package]. 4) The manual curation assistance PADmet module allows the reported metabolic networks and their metadata to be curated. 5) The Wiki-export PADmet module enables the export of the metabolic network and its functional genomic unit as a local wiki platform allowing a user-friendly investigation [package].
Originality The main added-values are the inclusion of graph-based gap-filling tools that are particularly relevant for the study of non-classical organisms, the possibility to trace the reconstruction and curation procedures, and the representation and exploration of reconstructed metabolic networks with wikis.
Application The tools included in AuReMe have been used for reconstructing metabolic networks of micro and macro-algae , extremophile bacteria and communities of organisms in the context of the Idealg, Ciric-omics and IPL Algae-In-Silico projects.
Goal Filtering interaction networks with graph-based optimization criteria.
Description The goal is to offer a set of tools for the reconstruction of networks from genome, literature and large-scale observation data (expression data, metabolomics...) in order to elucidate the main regulators of an observed phenotype. Most of the optimization issues are addressed with Answer Set Programming. 1) The lombarde package enables the filtering of transcription-factor/binding-site regulatory networks with mutual information reported by the response to environmental perturbations. The high level of false-positive interactions is filters according to graph-based criteria. Knowledge about regulatory modules such as operons or the output of the shogen package can be taken into account , [web server]. 2) The KeyRegulatorFinder package allows searching key regulators of lists of molecules (like metabolites, enzymes or genes) by taking advantage of knowledge databases in cell metabolism and signaling. The complete information is transcribed into a large-scale interaction graph which is filtered to report the most significant upstream regulators of the considered list of molecules [package]. 3) The powerGrasp python package provides an implementation of graph compression methods oriented toward visualization, and based on power graph analysis. [package]. 4) The iggy package enables the repairing of an interaction graph with respect to expression data. It proposes a range of different operations for altering experimental data and/or a biological network in order to re-establish their mutual consistency, an indispensable prerequisite for automated prediction. For accomplishing repair and prediction, we take advantage of the distinguished modeling and reasoning capacities of Answer Set Programming. [Python package]
Originality The main added-value of these tools is to make explicit the criteria used to highlight the role of the main regulators: the underlying methods encode explicit graph-based criteria instead of relying on statistical approaches. This makes it possible to explain local relationships and patterns within interaction graphs by explicit biological relationships.
Application The tools have been used to figure out the main gene-regulators of the response of porks to several diets in , and . The tools were also used to to decipher regulators of reproduction for the pea aphid, an insect that is a pest on plants , .
Goal Studying synchronous boolean networks [url]
Description Cell ASP Optimizer (Caspo) constitutes a pipeline for automated reasoning on logical signaling networks. The main underlying issue is that inherent experimental noise is considered, many different logical networks can be compatible with a set of experimental observations (see and ). It is available as a Docker container. Five modules are composing Caspo: 1) the Caspo-learn module performs an automated inference of logical networks from experimental data allows for identifying admissible large-scale logic models saving a lot of efforts and without any a priori bias and . 2) The Caspo-classify, predict and visualize modules allows for classifying a family of boolean networks with respect to their input-output predictions . 3) The Caspo-design module designs experimental perturbations which would allow for an optimal discrimination of rival models in a family of boolean networks . 4) The Caspo-control module identifies key-players of a family of networks: it computes robust intervention strategies that force a set of target species or compounds into a desired steady state . 5) The Caspo-timeseries module to take into account time-series observation datasets in the learning procedure [python package and docker container].
Originality The Caspo modules provide friendly and efficient solutions to problems that were previously addressed in theoretical papers with MILP programs. The main advantage is that is enables a complete study of logical network without requiring any linear constraint programs.
Application The Caspo tool was initiated in the framework of the BioTempo project. Caspo-learn has been included as a module to learn logical networks from early steady-state data in CellNopt, a generic platform which implements several methods for learning and studying signaling networks are different modeling levels (from logical models to numerical models).
Goal Building and analyzing the asynchronous dynamics of enriched logical networks [url]
Description Based on Guarded transition semantic, the Cadbiom software provides a formal framework to help the modeling of biological systems such as cell signaling network. It allows synchronization events to be investigated in biological networks . It is available as a Docker image. Three modules are composing Cadbiom: 1) The Cadbiom graphical interface is useful to build and study moderate size models. It provides exploration, simulation and checking. For large-scale models, Cadbiom also allows to focus on specific nodes of interest. 2) The Cadbiom API allows a model to be loaded, performing static analysis and checking temporal properties on a finite horizon in the future or in the past. 3) Exploring large-scale knowledge repositories, the translations of the large-scale PID repository (about 10,000 curated interactions) have been translated into the Cadbiom formalism.
Originality Model-checking approaches applied to Boolean networks or multivalued networks allow the trajectories of the system to be entirely studied but they can only be applied to small-size networks. On the contrary, Cadbiom is able to handle large-scale knowledge databases.
Application The Cadbiom tool was applied to study the regulators of the TGF-
Goal Complex pattern modelling and matching [url]
Description The Logol toolbox is a swiss-army-knife for pattern matching on DNA/RNA/Protein sequences, using a high-level grammatical formalism to permit a large expressivity for patterns . A Logol pattern can consist in a complex combination of motifs (such as degenerated strings) and structures (such as imperfect stem-loop ou repeats). Logol key features are the possibilities to divide a pattern description into several sub-patterns, to model long range dependencies, to enable the use of ambiguous models or to permit the inclusion of negative conditions in a pattern definition. The LogolMatch parser takes as input a biological sequence and a grammar file. It returns a XML file containing all the occurrences of the pattern in the sequence with their parsing details. The input sequences can be genomes from biological banks.
Originality Many pattern matching tools exist to efficiently model specific types of patterns: vmatch , patmatch , cutadapt , scoring matrix or profile HMMs , . The main advantage of Logol is its very large expressivity. It encompasses most of the features of these specialized tools and enables interplays between several classes of patterns (motifs and structures).
Application The Logol tool was applied to the detection of mutated primers in a metabarcoding study , or to stem-loop identification (e.g. in CRISPR
Goal Expressive pattern discovery on protein sequences [url]
DescriptionProtomata is a machine learning suite for the inference of automata characterizing (functional) families of proteins from available sequences. Based on partial and local alignments, Protomata learns precise characterizations of the families of proteins, allowing new family members to be predicted with a high specificity. Three main modules are integrated in the Protomata-learner workflow are available as well as stand-alone programs: 1) paloma builds partial local multiple alignments, 2) protobuild infers automata from these alignements and 3) protomatch and protoalign scans, parses and aligns new sequences with learnt automata. The suite is completed by tools to handle or visualize data and can be used online by the biologists via a web interface on Genouest Platform. It is actively maintained (version v2.1 was released in April 2017) and we are scheduling a new major version with enhanced scoring schemes that we have proposed .
Originality The main specificity is that the power of characterization is beyond the scope of classical sequence patterns such as PSSM (e.g. MEME suite ), Profile HMM (e.g. HMMER package ), or Prosite Patterns allowing new family members to be predicted with a high specificity.
Application The Protomata tool is used both to update automatically the Cyanolase database and, when combined to Formal Concept Analysis, to automated enzyme classification, such as the HAD superfamily of proteins in the framework of the Idealg project.
Our methods are applied in several fields of molecular biology.
Our main application field is marine biology, as it is a transversal field with respect to issues in integrative biology, dynamical systems and sequence analysis. Our main collaborators work at the Station Biologique de Roscoff. We are strongly involved in the study of brown algae: the meneco, memap and memerge tools were designed to realize a complete reconstruction of metabolic networks for non-benchmark species , . On the same application model, the pattern discovery tool protomata learner combined with supervised bi-clustering based on formal concept analysis allows for the classification of sub-families of specific proteins . The same tool also allowed us to gain a better understanding of cyanobacteria proteins . At the larger level of 4D structures, classification technics have also allowed us to introduce new methods for the characterization of viruses in marine metagenomic sample . Finally, in dynamical systems, we use asymptotic analysis (tool pogg) to decipher the initiation of sea urchin translation . We are currently involved in two new applications in this domain: the team participates to a Inria Project Lab program with the Biocore and Ange Inria teams, focused on the understanding on green micro-algae; and we are involved in the deciphering of phytoplancton variability at the system biology level in collaboration with the Station Biologique de Roscoff (ANR Samosa).
In micro-biology, our main issue is the understanding of bacteria living in extreme environments, mainly in collaboration with the group of bioinformatics at Universidad de Chile (funded by CMM, CRG and Inria-Chile). In order to elucidate the main characteristics of these bacteria, we develop efficient methods to identify the main groups of regulators for their specific response in their living environment. To that purpose, we use constraints-based modeling and combinatorial optimization. The integrative biology tools meneco bioquali, ingranalysis, shogen, lombarde were designed in this context . In 2016, two applications focused on the study of extremophile consortium of bacteria have been performed with these tools , . In parallel, in collaboration with Ifremer (Brest), we have conducted similar work to decipher protein-protein interactions within archebacteria . Our sequence analysis tool (logol) allowed us to build and maintain a very expressive CRISPR database .
Similarly, in environmental sciences, our goal is to propose methods to identify regulators of very complex phenotypes related to environmental issues. In collaboration with researchers from Inra/Pegase laboratory, we develop methods to distinguish the response of breeding animals to different diaries or treatments and characterize upstream transcriptional regulators , applied to porks , , . Semantic-based analysis was useful for interpreting differences of gene expression in pork meat .
In addition, constraints-based programming also allows us to decipher regulators of reproduction for the pea aphid, an insect that is a pest on plants , . This was performed in collaboration with Inra/Igepp. This paved the way to the recent research track initiated in the team about integration of heterogeneous data with RDF-technologies (see AskOmics software) , and about graph-compression (see powergrasp software).
In bio-medical applications, we focus our attention on the confrontation of large-scale measurements with large-scale knowledge repositories about regulation pathways such as Transpath, PID or pathway commons. In collaboration with Institut Curie, we have studied the Ewing Sarcoma regulation network to test the capability of our tool bioquali to accurately correct and predict a large-scale network behavior . Our ongoing studies in this field focus on the exhaustive learning of discrete dynamical networks matching with experimental data, as a case study for modeling experimental design with constraints-based approaches. To that purpose, we collaborate with J. Saez Rodriguez group at EBI and N. Theret group at Inserm/Irset (Rennes) . The dynamical system tools caspo and cadbiom were designed within these collaborations. Ongoing studies focus on the understanding of the metabolism of xenobiotics (mecagenotox program) and the filtering of sets of regulatory compounds within large-scale signaling network (TGFSysBio project).
The team received a best paper award at the conference ICFCA
Keywords: RDF - SPARQL - Querying - Graph - LOD - Linked open data
Functional Description: AskOmics allows to load heterogeneous bioinformatics data (formatted as tabular files) into a Triple Store system using a user-friendly web interface. AskOmics also provides an intuitive graph-based user interface supporting the creation of complex queries that currently require hours of manual searches across tens of spreadsheet files. The elements of interest selected in the graph are then automatically converted into a SPARQL query that is executed on the users' data.
News Of The Year: Several functionalities have been developed: 1) capacity of integrating genomics data (import of GFF and BED files and generation of RDF compliant with the FALDO ontology), 2) integration of data and knowledge in the OWL format to exploit biological information from external repositories, particularly from EBI and NCBI. Notably, this functionality allows AskOmics to support the Gene Ontology, the Taxonomy ontology as well as BioPAX biological networks. 3) improved user interface expressivity for generating SPARQL queries, 4) implementation of a support for multiple concurrent user sessions, with the distcintion between public and user-specific datasets 5) deployment of AskOmics on the GenOuest cloud infrastructure to facilitate its release and diffusion 6) interoperability between AskOmics and the Galaxy workflow environment.
Authors: Charles Bettembourg, Xavier Garnier, Anthony Bretaudeau, Fabrice Legeai, Olivier Dameron, Olivier Filangi and Yvanne Chaussin
Partners: Université de Rennes 1 - CNRS - INRA
Contact: Fabrice Legeai
Keywords: Metabolic networks - Bioinformatics - Workflow - Omic data - Toolbox - Data management - LOD - Linked open data
Functional Description: The main concept underlying padmet-utils is to provide solutions that ensure the consistency, the internal standardization and the reconciliation of the information used within any workflow that combines several tools involving metabolic networks reconstruction or analysis.
News Of The Year: In 2017, Padmet-utils was enriched with a RDF export to allow the interoperability of the AuReMe workspace for the reconstruction of metabolic networks with the Askomics Tool for querying heterogeneous data. Padmet-utils was also extended to handle metabolic networks in the SBML3 format.
Participants: Alejandro Maass, Meziane Aite and Anne Siegel
Partner: University of Chile
Contact: Anne Siegel
Computer Aided Design of Biological Models
Keywords: Health - Biology - Biotechnology - Bioinformatics - Systems Biology
Functional Description: Based on Guarded transition semantic, this software provides a formal framework to help the modeling of biological systems such as cell signaling network. It allows investigating synchronization events in biological networks.
Software development has been restarted since November 2016. The source code is available at the following address: https://
Participants: Geoffroy Andrieux, Michel Le Borgne, Nathalie Theret, Nolwenn Le Meur and Pierre Vignet
Contact: Anne Siegel
Crossroads in Metabolic Network from Stoechiometric and Topologic Studies
Keywords: Bioinformatics - ASP - Answer Set Programming - Constraint-based programming
Functional Description: This Python package in systems biology allows the identification of essential metabolites with respect to the production of targeted elements in a metabolic network, by comparing flux and graph-based analysis. Conquests's inputs are a sbml file corresponding to a metabolic network and the biomass reaction name. The outputs are three sets of essential metabolites. They are computed according to three complementary criteria: graph-based accessibility of targeted metabolites, the presence of flux in the biomass reaction and the maximisation of flux in the biomass reaction.
News Of The Year: Conquest was released in 2017.
Contact: Julie Laniau
Interoperable infrastructure and implementation of a health data model for remote monitoring of chronic diseases with comorbidities In the context of telemedecine, we worked on a numerical application for monitoring patients with chronic diseases. We have developed a system based on a formal ontology that integrates the alert information and the patient data extracted from the electronic health record in order to better classify the importance of alerts. A pilot study was conducted on atrial fibrillation alerts. The results suggest that this approach has the potential to significantly reduce the alert burden in telecardiology , . In 2017, we proposed an architecture supporting data exchange in the context of multiple chronic diseases [O. Dameron, Y. Rivault] .
AskOmics, a web tool to integrate and query biological data using semantic web technologies
The software AksOmics has been adapted to two types of scientific topics important in agronomical and environmental sciences: plant genomic data and insect pest genomic data. With AskOmics, plant genomicists (from academic and private labs from the Rapsodyn project - Investment for the future) working on the rapeseed (Brassica napus) are able to tackle the understanding of which gene copy is active or repressed in key developmental processes in relation with seed quality and oil production, in the frame of plant breeding. Additionally, entomologists use this tool to extract valuable knowledge on the way insect pests such as aphids are able to rapidly disseminate on crops, in the frame of free-pesticide methods for plant protection. AskOmics has been presented to the international community of insect genomics (i5k: http://
A transcriptome multi-tissue analysis identifies biological pathways and genes associated with variations in feed efficiency of growing pigs Our work on the identification of upstream regulators within large-scale knowledge databases (prototype KeyRegulatorFinder) was valuable for figuring out the main gene-regulators of the response of porks to several diets [F. Moreews, A. Siegel]
FCA in a Logical Programming Setting for Visualization-oriented Graph Compression We have explored the underlying idea of lossless network compression to address the problem of uncertainty in biological networks built from predictions, to help to visualize the networks and to classify their nodes in accordance with available annotations . Network compression has been used with success in Dresden (M. Schroeder) with a heuristic approach called Power Graph analysis building abstract graphs where nodes are clusters of nodes in the initial graph and edges represent bicliques between two sets of nodes. First encouraging results have been presented (best paper award) showing that it is possible to mimic the Power Graph behaviour while opening the possibility to achieve better compression levels compared to alternative compression schema. [L. Bourneuf, J. Nicolas]
Metabolic network completion and analysis We released the application paper of the tool Meneco, a tool dedicated to the topological gap-filling of genome-scale draft metabolic networks. The tool reformulates gap-filling as a qualitative combinatorial optimization problem, omitting constraints raised by the stoichiometry, and solves this problem using Answer Set Programming. Run on an artificial test set of 10,800 degraded Escherichia coli networks, we evidenced that Meneco outperforms the stoichiometry-based tool Gapfill in terms of precision. In addition, Meneco reports 10 times less putative reactions than MILP-based tool Fastgapfill for an equivalent precision. This is a strong advantage for manual curation post-processing, since curating 50 to 80 reactions is still possible whereas manually-curating 800 reactions is out-of-range. Meneco was applied to the reconstruction and understanding of a pathogeneic strain of salmon. [C. Frioux, J. Got, A. Siegel] ,
Toward the study of metabolic functions in communities of organisms In , we provided a first example on how to use topological metabolic modeling to assess the complementarity between two members of an algal ecosystem. Since this study, we generalized the selection of subcommunities of interest and propose likely interactions that could occur between seaweeds and their associated bacteria. A focus has also been done on plant microbiota and the reasons underlying the organization of the community. Altogether, these on-going works enable a better understanding of holobiont organizations and functioning. [M. Aite, M. Chevallier, C. Frioux, J. got, A. Siegel, C. Trottier] , ,
Hybrid Metabolic Network Completion In order to improve the precision of gap-filling approaches, we introduced a hybrid approach to formally reconcile existing stoichiometric and topological approaches to network completion in a unified formalism. An hybrid ASP encoding based on MILP constraint propagator was developed. It relies upon the theory reasoning capacities of the ASP system Clingo to solve the resulting logic program with linear constraints over reals. For short, this technology made it possible to combine the best of the combinatorial problem solver Clingo with the MILP solver CPlex. Run on the artificial test set of 10,800 degraded Escherichia coli networks introduced in , our approach yielded greatly superior results than obtainable from purely qualitative or MILP approaches. [C. Frioux, A. Siegel] ,
Combining graph and flux-based structures to decipher phenotypic essential metabolites within metabolic networks Whenever flux or graph-based criteria are used to study metabolic networks, these analyses are generally centered on the outcome of the network and considers all metabolic compounds to be equivalent in this respect. We generalized the concept of essentiality to metabolites and introduced the concept of the phenotypic essential metabolite (PEM) which influences the growth phenotype according to sustainability, producibility or optimal-efficiency criteria. The exhaustive study of phenotypic essential metabolites in six genome-scale metabolic models suggests that the combination and the comparison of graph, stoichiometry and optimal flux-based criteria allow some features of the metabolic network functionality to be deciphered by focusing on a small number of compounds. [C. Frioux, J. Laniau, A. Siegel]
A modeling approach to evaluate the balance between bioactivation and detoxification of MeIQx in human hepatocytes Heterocyclic aromatic amines (HAA), including MeIQx, are environmental and food contaminants that are potentially carcinogenic for humans. Using a computational approach, we developed a numerical model for MeIQx metabolism that predicts the MeIQx biotransformation into detoxification or bioactivation pathways according to the concentration of MeIQx. Our results demonstrate that CYP1A2 is a key enzyme in the system that regulates the balance between bioactivation and detoxification. This highlights the importance of complex regulations of enzyme competitions that should be taken into account in any multi-organ model [V. Delannée, A. Siegel, N. Théret]
caspo: a toolbox for automated reasoning on the response of logical signaling networks families The accompanying paper of the complete family of modules introduced in the caspo software was published in 2017 (see software section for details) [A. Siegel]
Identifying Functional Families of Trajectories in Biological Pathways by Soft Clustering: Application to TGF-
A Logic for checking the probabilistic steady-state properties of reaction networks. We have constructed a probabilistic analog to flux balance analysis of reaction networks to enable a formal verification of logical constrains about the stationary regime of a system by using information from experimental variances and co-variances. This is mainly based on a stationary analysis of the probabilistic dynamics relying on a Bernoulli approximation of a reaction network. The analysis requires solving non linear optimization problems [J. Bourdon, A. Siegel]
Better scoring schemes for the recognition of functional proteins by protomata The machine learning algorithm included in Protomata-learner learns weighted automata representing both functional families from the sequences of amino acids, and the possible disjunctions between members. We investigated alternative sequence weighting strategies and null-models. We introduced a normalization of the score, and a method to assess the significance of scores, to simplify the prediction. Preliminary results show a good improvement of the prediction power of the computed models. [F. Coste]
Detection of mutated primers and impact on targeted metagenomics results In targeted metagenomics, an initial task is the detection in each sequence of the primers used for amplifying the targeted region. The selected sequences are then trimmed and clustered in order to inventory the species present in the sample. Common pratices consist in retaining only the sequences with perfect primers (i.e. non-mutated by sequencing error). In the context of a study characterizing the biodiversity of tropical soils in unicellular eukaryotes, we have implemented the search for mutated primers, using the grammatical pattern matching tool Logol, and shown that retrieving sequences with mutated primers has a significant impact on targeted metagenomics results, as it makes possible to detect more species (7% additional OTUs in our study). [C. Belleannée] .
First landscape of binding to chromosomes for a domesticated mariner transposase in the human genome. In order to study the diversity of genomic targets of the SETMAR protein in two colorectal cell lines, a first task was to massively discover the Made1 80-bp transposon element in the human genome. For that, we used our Logol grammar-like approach to look for non perfect Made1 instances. In Logol, a pattern can be divided into several sub-patterns. The Made1 model took advantage of this feature to strengthen the most conserved regions. Cumulating this search with the Blast alignment search permitted to significantly increase the Made1 annotation in the human genome.[C. Belleannée]
Our software AskOmics was considered as relevant by the Sanofi bio-medical company in order to facilitate the integration and the query of the data produced by their scientists. A former Ph.D. of Dyliss who designed the first prototypes of AskOmics was recruited by Sanofi. Since then, Sanofi is included in the developer's team of AskOmics and a joint Dyliss–Sanofi CIFRE Ph.D. thesis started about the integration of complementary reasoning features to SPARQL queries in Oct. 2017.
EcoSyst is a Biogenouest inter-regional federating project (Brittany & Pays de la Loire) aiming at the emergence of Systems Ecology at the level of western France regions. Drawing on the strengths and skills involved, EcoSyst targets the incubation of new ideas and new projects at disciplinary interfaces. Thanks to this community project, we want to develop the skills of Ecology, Environment, Modeling, Bioinformatics and Systems Biology and their application to organisms and ecosystems of interest in agronomy, sea and health. EcoSyst includes also the identification of the major issues and concerns, the fundamental and essential methods and the very real needs of the community (training, tools, ...); this in order to consider the construction of a community platform (or an offer of service within an existing platform) on complex systems modeling, meeting expectations of the community as fully as possible.
Methodologies are developed in close collaboration with the LS2N (fusion of LINA and IRCCyN) located at University of Nantes and École centrale de Nantes. This is acted through the Biotempo and Idealg ANR projects and co-development of common software toolboxes within the Renabi-GO platform support. C. Trottier is a co-supervised bioianalysis and software development engineer within the Idealg project. M. Chevallier is a co-supervised development and animation engineer within the regional initiative "Ecosyst". In addition, the ongoing Ph-D student J. Laniau is co-supervised with a member of the LS2N laboratory. Finally, M. Folschette is a PostDoc working on on a project aiming at analyzing TGF-beta-related pathways evolutions after epithelial-mesenchymal transition in liver cancer, which is a recognized biological process leading to metastasis. This project is based on a topic shared with the LS2N: the use of graph coloring and reconstruction to witness expression changes, and is funded by the Université Bretagne Loire.
A strong application domain of the Dyliss project is marine Biology. This application domain is co-developed with the station biologique de Roscoff and their three UMR and involves several contracts. Our approach based on parcimonious modelling allowed an in silico characterization of processes required within sea urchin translation , . We are also strongly involved in the the IDEALG consortium, a long term project (10 years, ANR Investissement avenir) aiming at the development of macro-algae biotechnology. Among the research activities, we are particularly interested in the analysis and reconstruction of metabolism and the characterization of key enzymes. Our methods based on combinatorial optimization for the reconstruction of genome-scale metabolic networks and on classification of enzyme families based on local and partial alignments allowed the E. Siliculosus seaweed metabolism to be deciphered , . As a further study, we reconstructed the metabolic network of a symbiot bacterium Ca. P. ectocarpi and used this reconstructed network to decipher interactions within the algal-bacteria holobiont .
We have a strong and long term collaboration with biologists of INRA in Rennes : PEGASE and IGEPP units. F. Morrews is a permanent engineer from PEGASE center hosted in the team to develop methods for integrative biology applied to species of interest in agriculture. D. Tagu is a research director at INRA/IGEPP who spends 20% of his time in the team to develop collaborative projects. This partnership has been supported by the co-supervision of phDs, post-docs and engineers. This collaboration was also reinforced by collaboration within ANR contracts (MirNadapt, FatInteger).
In collaboration with researchers from the PEGASE center (INRA) focused on breeding animals, we have contributed to several studies aiming at better integrating and investigating data in order to facilitate animal selection and alimentation. The NutritionAnalyzer prototype was developed to understand better the impact of several diaries or treatments for lactary cows over the composition of milk . Our work on the identification of upstream regulators within large-scale knowledge databases (prototype KeyRegulatorFinder) and on semantic-based analysis of metabolic networks was also very valuable for interpreting differences of gene expression in pork meat and figure out the main gene-regulators of the response of porks to several diets (see , and ).
In addition, constraints-based programming also allows us to decipher regulators of reproduction for the pea aphid, an insect that is a pest on plants in the framework of the MirnAdapt project. In terms of biological output of the network studies on the pea aphid microRNAs, we have identified one new microRNA (apmir-3019, not present in any known species other than the pea aphid) who has more than 900 putative mRNA targets. All these targets, as well as apmir3019, are differentially expressed between sexual and asexual embryos , .
We also have a strong and long term collaboration in health, namely with the IRSET laboratory at Univ. Rennes 1. N. Théret, research director at INSERM, is hosted in the team to strenghen our collaborative projects. Our collaborations are acted by the co-supervised Ph-D theses of V. Delannée , M. Conan (Metagenotox project, funded by Anses) and J. Coquet . This partnership was reinforced by the ANR contract Biotempo ended at the end of 2014. In 2015, the project of combining semantic web technologies and bi-clustering classification based on formal concept analysis was applied to systems biology within the PEPS CONFOCAL project. This scientific project has been recently pushed forward in the recent TGFSYSBio project funded by Plan Cancer on the modelling of the microenvironment of TGFbeta signaling network (P. Vignet has been recruited on this contract at the end of 2016).
A new application was initiated in 2017 through a collaboration with Rennes hospital, supported by a Inria-INSERM Ph-D thesis (M. Louarn).
IDEALG is one of the five laureates from the national call 2010 for Biotechnology and Bioresource and will run until 2020. It gathers 18 different partners from the academic field (CNRS, IFREMER, UEB, UBO, UBS, ENSCR, University of Nantes, INRA, AgroCampus), the industrial field (C-WEED, Bezhin Rosko, Aleor, France Haliotis, DuPont) as well as a technical center specialized in seaweeds (CEVA) in order to foster biotechnology applications within the seaweed field. We are participating to the tasks related to the establishment of a virtual platform for integrating omics studies on seaweeds and the integrative analysis of seaweed metabolism, in cooperation with SBR Roscoff. Major objectives are the building of brown algae metabolic maps, flux analysis and the selection of symbiotic bacteria to brown algae. We will also contribute to the prediction of specific enzymes (sulfatases) [More details].
As a partner of the PEPS platform, several teams at Inria Rennes develop generic methods supporting efficient and semantically-rich queries for pharmaco-epidemiology studies on medico-administrative databases. The leader is Thomas Guyet (Inria team Lacodam). We showed that Semantic Web technologies are technically suited for representing patients' data from medico-administrative databases as RDF and querying them using SPARQL. We also demonstrated that this approach is relevant as it supports the combination of patients' data with hierarchical knowledge in order to address the problem of reconciling precise patients data with more general query criteria , , . This work is mostly conducted by Yann Rivault, whose PhD thesis is supervized by Olivier Dameron and Nolwenn LeMeur (Ecole des Hautes Etudes en Santé Publique).
The TGFSYSBIO project aims to develop the first model of extracellular and intracellular TGF-beta system that might permit to analyze the behaviors of TGF-beta activity during the course of liver tumor progression and to identify new biomarkers and potential therapeutic targets. Based on collaboration with Jérôme Feret from ENS, Paris, we will combine a rule-based model (Kappa language) to describe extracellular TGF-beta activation and large-scale state-transition based (Cadbiom formalism) model for TGF-beta-dependent intracellular signaling pathways. The multi-scale integrated model will be enriched with a large-scale analysis of liver tissues using shotgun proteomics to characterize protein networks from tumor microenvironment whose remodeling is responsible for extracellular activation of TGF-beta. The trajectories and upstream regulators of the final model will be analyzed with symbolic model checking techniques and abstract interpretation combined with causality analysis. Candidates will be classified with semantic-based approaches and symbolic bi-clustering technics. The project is funded by the national program "Plan Cancer - Systems biology" from 2015 to 2018.
Oceans are particularly affected by global change, which can cause e.g. increases in average sea temperature and in UV radiation fluxes onto ocean surface or a shrinkage of nutrient-rich areas. This raises the question of the capacity of marine photosynthetic microorganisms to cope with these environmental changes both at short term (physiological plasticity) and long term (e.g. gene alterations or acquisitions causing changes in fitness in a specific niche). Synechococcus cyanobacteria are among the most pertinent biological models to tackle this question, because of their ubiquity and wide abundance in the field, which allows them to be studied at all levels of organization from genes to the global ocean.
The SAMOSA project is funded by ANR from 2014 to 2018, coordinated by F. Gaczarek at the Station Biologique de Roscoff/UPMC/CNRS. The goal of the project is to develop a systems biology approach to characterize and model the main acclimation (i.e., physiological) and adaptation (i.e. evolutionary) mechanisms involved in the differential responses of Synechococcus clades/ecotypes to environmental fluctuations, with the goal to better predict their respective adaptability, and hence dynamics and distribution, in the context of global change. For this purpose, following intensive omics experimental protocol driven by our colleagues from
The objective of Mecagenotox project is to characterize and model the human liver ability to bioactivate environmental contaminants during liver chronic diseases in order to assess individual susceptibility to xenobiotics. Indeed, liver pathologies which result in the development of fibrosis are associated with a severe dysfunction of liver functions that may lead to increased susceptibility against contaminants. In this project funded by ANSES and coordinated by S. Langouet at IRSET/inserm (Univ. Rennes 1), we will combine cell biology approaches, biochemistry, biophysics, analytical chemistry and bioinformatics to 1) understand how the tension forces induced by the development of liver fibrosis alter the susceptibility of hepatocytes to certain genotoxic chemicals (especially Heterocyclic Aromatic Amines) and 2) model the behavior of xenobiotic metabolism during the liver fibrosis. Our main goal is to identify "sensitive" biomolecules in the network and to understand more comprehensively bioactivation of environmental contaminants involved in the onset of hepatocellular carcinoma.
These projects started in Oct. 2014 and aims at designing a working environment based on workflows to assist molecular biologists to integrate large-scale omics data on non-classical species. The main goal of the workflows will be to facilitate the identification of set of regulators involved in the response of a species when challenged by an environmental stress. Applications target extremophile biotechnologies (biomining) and marine biology (micro-algae).
Microalgae are recognized for the extraordinary diversity of molecules they can contain: proteins, lipids (for biofuel or long chain polyunsaturated fatty acids for human health), vitamins, antioxidants, pigments. The project aims at predicting and optimizing the productivity of microalgae. It involves mainly the inria teams Biocore (PI), Ange and Dyliss. Dyliss is in charge of the identification of physiological functions for microalgae based on their proteomes, which is undergone through the reconstruction of the metabolic network of the T. lutea microalgae.
The project aims at identifying the main markers of pathologies through the production and the integration of imaging and bioinformatics data. It involves mainly the inria teams Aramis (PI) Dyliss, Genscale and Bonsai. Dyliss is in charge of facilitating the interoperability of imaging and bioinformatics data.
This project aims at developing automatic generation of abstractions for biological data and knowledge in order to scale federated queries in the context of semantic web technologies. It is a common project with the Wimmics Inria team.
Partner: Aachen university (Germany)
Title: Modeling the logical response of a signalling network with constraints-programming.
We have a cooperation with Univ. of Chile (MATHomics, A. Maass) on methods for the identification of biomarkers and software for biochip design. It aims at combining automatic reasoning on biological sequences and networks with probabilistic approaches to manage, explore and integrate large sets of heterogeneous omics data into networks of interactions allowing to produce biomarkers, with a main application to biomining bacteria. The program is co-funded by Inria and CORFO-chile from 2012 to 2016. In this context, IntegrativeBioChile was an Associate Team between Dyliss and the Laboratory of Bioinformatics and Mathematics of the Genome hosted at Univ. of Chile funded from 2011 to 2016. The collaboration is now supported by Chilean programs.
Niger. University of Maradi [O. Abdou-Arbi]
Poland. Politechnika Wroclawska [W. Dyrka]
India. VIT University, Vellore [K. Lakshmanan]
Chile. University of Chile [A. Siegel, C. Frioux]
Germany. University of Potsdam [L. Bourneuf, 3 months (nov 2017 - jan 2018)]
SWAT4HCLS (2017) Semantic Web and Tools for Health Care and Life Sciences (O. Dameron)
BBCC (2017): Bioinformatica e Biologia Computazionale in Campania (O. Dameron)
JOBIM (2017): French conference of Bioinformatics (A. Siegel)
SIIM (2017) Symposium sur l'Ingénierie des Informations Médicales (O. Dameron)
ISMB/ECCB 2017.
O. Dameron is an associate editor of the Journal of Biomedical Semantics
J. Bourdon in an academic editor of PLoS One
Briefings in Bioinformatics, (O. Dameron)
Journal of Biomedical Semantics (O. Dameron)
Journal on Data Semantics (O. Dameron)
Molecular Cancer (J. Nicolas),
Plos One (J. Nicolas)
Paris, Hopital Lariboisière (Seminar, 2017) – SANOFI (Gentilly, 3 Invited seminars, 2017) – Clermont-Ferrand (Insect team, 2017) – Nantes (Université of Nantes, 2017).
Conference on Boolean networks (Marseille, 2017) – Bioss Meeting on artificial intelligence (Gif, 2017)
Member of the steering committee of the International Conference on Grammatical Inference.
The team was involved in the foundation of a national working group on the symbolic study of dynamical systems named bioss [web access]. The group gathers more than 170 scientists, from computer science to biology. Three meetings were organized this year. The group is supported by two French National Research Networks: bioinformatics (GDR BIM : bioinformatique moléculaire) and informatics-mathematics (GDR IM : Informatique Mathématique). It gathered twice in 2017: for a general meeting in Montpellier (Mar. 2017) and for a workshop focused on links between systems biology and artificial intelligence in Orsay (June 2017).
Evaluation panel of the "Europe-USA Call Strengthening Transnational Research in Molecular Plant Sciences" launched by ERA-CAPS.
Institutional boards for the recruitment and evaluation of researchers. Inria National evaluation board (A. Siegel, nominated member). National Council of Universities, section 65 (O. Dameron, nominated member).
Evaluation committees of French laboratories or doctoral schools. Bioinformatics groups of Institut Curie (Paris, presidency of the committee, A. Siegel) – Doctoral school of Nice University (N. Théret).
Presidency of the expert panel for the call Systems biology applied to Cancer of the National Cancer Plan 2017 (A. Siegel).
Recruitment committees. Inria Senior Researchers (national committee, A. Siegel) – Inria Junior Researchers (Nice, National Committee, A. Siegel)
Scientific Advisory boards GDR BIM " Molecular Bioinformatics" (J. Nicolas).
Scientific Advisory Board of the French National Research Network GDR BIM Molecular Bioinformatics (J. Nicolas).
Operational Legal and Ethical Risk Assessment Committee (COERLE) at Inria (J. Nicolas).
Animation of the Bioss working group (A. Siegel).
Board of directors of the French Society for biology of the extracellular matrix (N. Théret).
"Big & Open Data" foresight working group of PROSPER network (F. Coste).
"Prospectives in predictive toxicology" working group at INRA (A. Siegel)
Scientific Advisory Board of Biogenouest (J. Bourdon, N. Théret)
IRISA laboratory (computer science department of Univ. Rennes 1) council (A. Siegel)
Responsability of the IRISA laboratory "Health-biology" cross-cutting axis (O. Dameron)
SCAS (Service Commun d'Action Sociale) of Univ. Rennes 1 (C. Belleannée)
Scientific committee of Univ. Rennes 1 school of medicine (O. Dameron, A. Siegel).
Coordination of the doctoral school "Life, Agronomy and Health" of University of Rennes 1 [N. Théret]
Coordination of the master degree "Bioinformatics and genomics", Univ. Rennes1 [O. Dameron]
Coordination of the sub-domain "From Data to Knowledge: Machine Learning, Modeling and Indexing Multimedia Contents and Symbolic Data", Master in Computer Science, University of Rennes 1, France [F. Coste].
"Atelier bioinformatique", Licence 2 informatique, Univ. Rennes 1 [O. Dameron]
"Bioinformatique pour la génomique", 2nd year school of medicine, Univ. Rennes 1 [O. Dameron]
"Bases de mathématiques et probablité" and "Méthodes en informatique", Master1 in public health, Univ. Rennes 1 [O. Dameron]
"Big data and Semantic Web", Master 2 in public health, Univ. Rennes 1 [O. Dameron]
"Intégration: Remise à niveau en informatique", Master 1 in bioinformatics, Univ. Rennes 1 [O. Dameron]
"Programmation en Python", Master 1 in Public Health, Univ. Rennes 1 [O. Dameron]
"Programmation impérative en Python", Master 1 in bioinformatics, Univ. Rennes 1 [O. Dameron]
"Système informatique GNU/Linux", Master 1 in bioinformatics, Univ. Rennes 1 [O. Dameron]
"Semantic Web and bio-ontologies", Master 2 in bioinformatics, Univ. Rennes 1 [O. Dameron]
"e-Santé et réseaux hospitaliers", last year in engineering school ESIR, Univ. Rennes 1, [O. Dameron]
"Equilibre Dynamique de la communication Cellulaire" Master 2 in Sciences cellulaire et Moléculaire du Vivant, Univ. Rennes 1 [N. Theret]
Licence: C. Belleannée, Langages formels, 20h, L3 informatique, Univ. Rennes1, France.
Licence: C. Belleannée, Algorithmique et Programmation Fonctionnelle, 60, L1 informatique, Univ. Rennes1, France.
Licence: J. Coquet, Module Programmation Scientifique 1, 20h, L1 informatique, Rennes1, France.
Licence: O. Dameron, Biostatistiques, 12h, 1st year school of medicine, Univ. Rennes 1, France
Licence: O. Dameron, C2i niveau 2, 2.5h, 2nd year school of medicine, Univ. Rennes 1, France
Licence: O. Dameron, Bioinformatique pour la génomique, 5h, 2nd year school of medicine, Univ. Rennes 1, France
Licence: C. Frioux, Programmation scientifique Python, 12h, L1, Rennes1, France.
Licence: C. Frioux, LaTeX, 12h, L3 ENSAI, France.
Licence: C. Frioux, Outils bureautiques pour le statisticien , 6h, L3 ENSAI, France.
Licence: C. Frioux, Algorithmique et programmation Python, 6h, L3 ENSAI, France.
Licence: L. Bourneuf, Ingénieurerie Systèmes et Réseaux, 10h, L3 INFO, France.
Licence: L. Bourneuf, Algorithmique des graphes, 8h, L3 INFO, France.
Licence: L. Bourneuf, Algorithmique des graphes, 2h, L3 MIAGE, France.
Master: L. Bourneuf, Principes de Programmation et d'Algorithmique, 6h, M1 BIG, France.
Master: L. Bourneuf, Projet, 10h, M1 BIG, France.
Master: C. Belleannée, Programmation logique avec contraintes et algorithmes génétiques, 40h, M1 informatique, Univ. Rennes1, France.
Master: C. Belleannée, Algorithmique du texte et bioinformatique, 10h, M1 informatique, Univ. Rennes1, France
Master: F. Coste, Apprentissage Automatique Supervisé, 10h, M2 Informatique, Univ. Rennes 1, France
Master: O. Dameron, Object-oriented programing, 20h, M1 bioinformatique et génomique, Univ. Rennes 1, France
Master: O. Dameron, Gestion de projet en informatique, 12h, M1 bioinformatique et génomique, Univ. Rennes 1, France
Master: O. Dameron, Ontologies biomédicales, 6h, Engineering school Institut Mines-Télécom Bretagne-Atlantique Brest, France
Master: O. Dameron, Internship jury, 25h, M1 bioinformatique et génomique, Univ. Rennes 1, France
Master: O. Dameron, Internship jury, 7.5h, M2 bioinformatique et génomique, Univ. Rennes 1, France
Master: O. Dameron, Intégration : remise à niveau en informatique, 14h, M1 bioinformatique, Univ. Rennes 1, France
Master: O. Dameron, Programmation impérative en Python, 39.5h, M1 bioinformatique, Univ. Rennes 1, France
Master: O. Dameron, Système informatique GNU/Linux, 12h, M1 bioinformatique, Univ. Rennes 1, France
Master: O. Dameron, Programmation en Python, 24h, M1 in Public Health, Univ. Rennes 1 [O. Dameron]
Master: O. Dameron, Semantic Web and bio-ontologies, 14h, M2 bioinformatique, Univ. Rennes 1, France
Master: O. Dameron, Bases de mathématiques et probabilités, 15h, M1 santé publique, Univ. Rennes 1, France
Master: A. Siegel, Integrative and Systems biology, 20h, M2, Univ. Rennes 1, France
Master: A. Siegel, Introduction to integrative biology, 2h, M2, Univ. Rennes 1, France
PhD : Victorien Delannée, Intégrer les échelles moléculaires et cellulaires dans l’inférence de réseaux métaboliques. Application aux xénobiotiques., started in Oct. 2014, defensed in Nov. 2017, supervised by A. Siegel and N. Théret .
PhD : Julie Laniau, Structure de réseaux biologiques : rôle des noeuds internes vis-à-vis de la production de composés, started in Oct. 2013, defended in Oct. 2017, supervised by A. Siegel and D. Eveillard.
PhD : Jean Coquet, Semantic-based reasoning for biological pathways analysis, started in Oct. 2014, defended in Dec. 2017, supervised by O. Dameron and N. Théret /
PhD in progress : Lucas Bourneuf, Justifiable graph decomposition to assist biological network understanding, started in Oct. 2016, supervised by J. Nicolas.
PhD in progress : Clémence Frioux, Using preferences in Answer Set Programming to decipher interactions within the species of an ecosystem at the genomic scale, started in Oct. 2015, supervised by A. Siegel.
PhD in progress : Yann Rivault, Analyse de parcours de soins à partir de bases de données médico-administratives en utilisant des outils du Web Sémantique: identification de complications et de leurs déterminants suite à la pose chirurgicale de dispositif médical implantable en ambulatoire , started in Oct. 2015, supervised by O. Dameron and N. Lemeur.
PhD in progress : Juliette Talibart, Learning grammars with long-distance correlations on proteins, started in Nov. 2017, supervised by F. Coste and J. Nicolas.
PhD in progress : Mael Conan, Predictive approach to assess the genotoxicity of environmental contaminants during liver fibrosis, started in Oct. 2017, supervised by S. Langouet and A. Siegel.
PhD in progress: Marine Louarn, Intégration de données génomiques massives et hétérogènes, application aux mutations non-codantes dans le lymphome folliculaire, started in Oct. 2017, supervised by A. Siegel, T. Fest (CHU) and O. Dameron.
PhD in progress : Méline Wery, Methodology development in disease treatment projects. , started in Oct. 2017, supervised by O. Dameron, C. Bettembourg (Sanofi) and A. Siegel.
Member of Ph-D thesis juries. J. Mercier, Univ. Evry/CEA [A. Siegel, reviewer]. C. Franay, INRA Toulouse [A. Siegel, reviewer]. W Bedhiafi, Univ. Pierre et Marie Curie Paris + UTM Tunis [O. Dameron]. V. Delannée, Univ. Rennes 1 [N. Theret, O. Dameron]. J. Coquet, Univ. Rennes 1 [O. Dameron, N. Theret]. P. Finet, Univ. Rennes 1 [O. Dameron]
Member of habilitation thesis juries. E. Remy, Univ. Marseille [A. Siegel, president].
Member of medicine doctorate juries G. Lebailly, Univ. Rennes 1 [O. Dameron].
Internship, from Jun 2017 until Jul 2017. Supervised by J. Nicolas. Student: Alexis Baudin. Subject: Recherche d'attracteurs dans les réseaux booléens synchrones en ASP.
Internship, from Jan until Jun 2017. Supervised by A. Siegel. Student: Mael Conan. Subject: Modélisation et caractérisation de la réponse au stress de la cyanobactérie marine Synechococcus sp. WH7803.
Internship, from from Apr 2017 until Jul 2017. Supervised by Nathalie Théret and Olivier Dameron. Student: Kevin Courtet. Subject: Integration of genic regulatory interaction network by miRNAs from patients’ macrophages with cystic fibrosis.
Internship, from Apr 2017 until Jul 2017. Supervised by J. Got. Student: Nicolas Guillaudeux Subject: Vérifications du réseau métabolique entier de Tisochrysis lutea.
Internship, from Apr 2017 until Jul 2017. Supervised by J. Nicolas and F. Coste. Student: AliHassan Kachalo Subject: Annotation automatique en familles des séquences d’une superfamille d’enzymes, les HAD (haloacides déhalogénases) par Analyse de Concepts Formels (FCA).
Internship, from from Apr 2017 until Jul 2017. Supervised by C. Frioux and C. Trottier. Student: Claire Lippold. Subject: Exploration et caractérisation du microbiome associé à Ectocarpus subulatus str. BFT.
Internship, from Jan 2017 until Jun 2017. Supervised by O. Dameron and A. Siegel. Student: Marine Louarn. Subject: Analysis and integration of heterogeneous large-scale genomics data.
Internship, from Apr 2017 until Jul 2017. Supervised by . Théret and J. Nicolas. Student: Aurelie Nicolas. Subject: Modeling of interaction networks from extracellular matrix components using formal concept analyses.
Internship, from Apr 2017 until Jul 2017. Supervised by C. Belleannée. Student: Dimitri Pedron. Subject: Annotation et prédiction de transcriptome: validation d'ORF alternatifs prédits. Application au gène CREM chez l'humain, la souris et le chien.
Internship, from Feb 2017 until Jun 2017. Supervised by F. Coste. Student: Manon Ruffini, Subject: Better scoring schemes for the recognition of functional proteins by protomata
Internship, from Feb 2017 until Jun 2017. Supervised by J. Nicolas. Student: MarieSalmon, Subject: Biclustering: quantitative formal concept analysis in Answer Set Programming
Internship, from Jan 2017 until Jun 2017. Supervised by A. Siegel and O. Dameron. Student: Meline Wery Subject: Formalizing and computing signatures of phenotypes within a biological network
Internship, from May 2017 until Jul 2017. Supervised by C. Belleannée. Student: Mohamed Zemmouri Subject: Analyse de texte en bioinformatique : Modélisation grammaticale d’un site ADN, et recherche du site, même dégénéré, sur l’intégralité du génome humain
We have written a contribution to the collaborative book edited by CNRS on the main issues of data-mining. Our contribution was specifically related to modeling issues arising in ecology with the development of NGS technologies .
(http://sciences-en-courts.fr/)
Many of our on-going and former Ph-D students (A. Antoine-Lorquin, C. Bettembourg, J. Coquet, V. Delannée, G. Garet, S. Prigent) have been heavily involved in organization of a local Popularization Festival where Ph.D. students explain their thesis via short movies. The movies are presented to a professional jury composed of artists and scientists, and of high-school students.
Previous years films can be viewed on the festival web-site