Section: Overall Objectives

Overall objectives

The research domain of the bioinformatics Dyliss team is sequence analysis and systems biology. Our main goal in biology is to characterize groups of genetic actors that control the phenotypic answer of species when challenged by their environment. The team explores methods in the field of formal systems, more precisely in knowledge representation, constraints programming, multi-scale analysis of dynamical systems, and machine learning. Our goal is to identify key regulators of the environmental response by structuring and reasoning on information which combines physiological responses measured with omics technologies (RNA-seq, metabolomics, proteomics), genetic information from their long-distant cousins and knowledge about regulation and metabolic pathways stored in public repositories.

The main challenges we face are data incompleteness and heterogeneity. We favor the construction and study of a "space of feasible models or hypotheses" including known constraints and facts on a living system rather than searching for a single optimized model. We develop methods allowing a precise investigation of this space of hypotheses. Therefore, we are in position of developing experimental strategies to progressively shrink the space of hypotheses and gain in the understanding of the system. Importantly, one should notice that our models spans a quite large spectrum of discrete structures: oriented graphs, boolean networks, automata, or expressive grammars.

More concretely, the steps of the analysis are to (i) formalize and integrate in a set of logical or grammatical constraints both generic knowledge information (litterature-based regulatory pathways, diversity of molecular functions, DNA patterns associated with molecular mechanisms) and species-specific information (physiological response to perturbation, sequencing...); (ii) investigate the space of admissible models and exhibit its main features by solving combinatorial optimization problems; (iii) identify corresponding genomic products within sequences. At each of these steps, we rely on symbolic methods for model space exploration: ontologies and formal concepts analysis.

We target applications for which large-scale heterogeneous data about a specific but complex physiological phenotype are available. Existing long-term partnerships with biological labs give strong support to this choice. In marine biology, we collaborate closely with the Station biologique de Roscoff (Idealg, Investissement avenir "Bioressources et Biotechnologies"). In environmental microbiology we collaborate both with the CRG in Chile in the framework of the Ciric Chilean Inria center (Ciric-Omics). In agriculture, our main partners are within the INRA institute in Rennes, with a focus on the understanding of pea-aphids microbiology and of breeding animals metabolism (porc, chicken, cow). More recently, we have introduced health as a new application field of the team, especially through the study of large-scale boolean networks and their confrontation with knowledge repositories.