Cells are seen as the basic structural, functional and biological units of all living systems. They represent the smallest units of life that can replicate independently, and are often referred to as the building blocks of life. Living organisms are then classified into unicellular ones – this is the case of most bacteria and archea – or multicellular – this is the case of animals and plants. Actually, multicellular organisms, such as for instance human, may be seen as composed of native (human) cells, but also of extraneous cells represented by the diverse bacteria living inside the organism. The proportion in the number of the latter in relation to the number of native cells is believed to be high: this is for example of 90% in humans. Multicellular organisms have thus been described also as “superorganisms with an internal ecosystem of diverse symbiotic microbiota and parasites” (Nicholson et al., Nat Biotechnol, 22(10):1268-1274, 2004) where symbiotic means that the extraneous unicellular organisms (cells) live in a close, and in this case, long-term relation both with the multicellular organisms they inhabit and among themselves. On the other hand, bacteria sometimes group into colonies of genetically identical individuals which may acquire both the ability to adhere together and to become specialised for different tasks. An example of this is the cyanobacterium Anabaena sphaerica who may group to form filaments of differentiated cells, some – the heterocysts – specialised for nitrogen fixation while the others are capable of photosynthesis. Such filaments have been seen as first examples of multicellular patterning.
At its extreme, one could then see life as one collection, or a collection of collections of genetically identical or distinct self-replicating cells who interact, sometimes closely and for long periods of evolutionary time, with same or distinct functional objectives. The interaction may be at equilibrium, meaning that it is beneficial or neutral to all, or it may be unstable meaning that the interaction may be or become at some time beneficial only to some and detrimental to other cells or collections of cells. The interaction may involve other living systems, or systems that have been described as being at the edge of life such as viruses, or else genetic or inorganic material such as, respectively, transposable elements and chemical compounds.
The application goal of ERABLE is, through the use of mathematical models and algorithms, to better understand such close and often persistent interactions, with a longer term objective of becoming able in some cases to suggest the means of controlling for or of re-establishing equilibrium in an interacting community by acting on its environment or on its players, how they play and who plays. This goal requires to identify who are the partners in a closely interacting community, who is interacting with whom, how and by which means. Any model is a simplification of reality, but once selected, the algorithms to explore such model should address questions that are precisely defined and, whenever possible, be exact in the answer as well as exhaustive when more than one exists in order to guarantee an accurate interpretation of the results within the given model. This fits well the mathematical and computational expertise of the team, and drives the methodological goal of ERABLE which is to substantially and systematically contribute to the field of exact enumeration algorithms for problems that most often will be hard in terms of their complexity, and as such to also contribute to the field of combinatorics in as much as this may help in enlarging the scope of application of exact methods.
The key objective is, by constantly crossing ideas from different models and types of approaches, to look for and to infer “patterns”, as simple and general as possible, either at the level of the biological application or in terms of methodology. This objective drives which biological systems are considered, and also which models and in which order, going from simple discrete ones first on to more complex continuous models later if necessary and possible.
ERABLE has two main goals, one related to biology and the other to methodology (algorithms, combinatorics, statistics). In relation to biology, the main goal of ERABLE is to contribute, through the use of mathematical models and algorithms, to a better understanding of close and often persistent interactions between “collections of genetically identical or distinct self-replicating cells” which will correspond to organisms/species or to actual cells. The first will cover the case of what has been called symbiosis, meaning when the interaction involves different species, while the second will cover the case of a (cancerous) tumour which may be seen as a collection of cells which suddenly disrupts its interaction with the other (collections of) cells in an organism by starting to grow uncontrollably.
Such interactions are being explored initially at the molecular level. Although we rely as much as possible on already available data, we intend to also continue contributing to the identification and analysis of the main genomic and systemic (regulatory, metabolic, signalling) elements involved or impacted by an interaction, and how they are impacted. We started going to the population and ecological levels by modelling and analysing the way such interactions influence, and are or can be influenced by the ecosystem of which the “collections of cells” are a part. The key steps are:
identifying the molecular elements based on so-called omics data (genomics, transcriptomics, metabolomics, proteomics, etc.): such elements may be gene/proteins, genetic variations, (DNA/RNA/protein) binding sites, (small and long non coding) RNAs, etc.
simultaneously inferring and analysing the network that models how these molecular elements are physically and functionally linked together for a given goal, or find themselves associated in a response to some change in the environment;
modelling and analysing the population and ecological network formed by the “collections of cells in interaction”, meaning modelling a network of networks (previously inferred or as already available in the literature).
One important longer term goal of the above is to analyse how the behaviour and dynamics of such a network of networks might be controlled by modifying it, including by subtracting some of its components from the network or by adding new ones.
In relation to methodology, the main goal is to provide those enabling to address our main biological objective as stated above that lead to the best possible interpretation of the results within a given pre-established model and a well defined question. Ideally, given such a model and question, the method is exact and also exhaustive if more than one answer is possible. Three aspects are thus involved here: establishing the model within which questions can and will be put; clearly defining such questions; exactly answering to them or providing some guarantee on the proximity of the answer given to the “correct” one. We intend to continue contributing to these three aspects:
at the modelling level, by exploring better models that at a same time are richer in terms of the information they contain (as an example, in the case of metabolism, using hypergraphs as models for it instead of graphs) and are susceptible to an easier treatment:
these two objectives (rich models that are at the same time easy to treat) might in many cases be contradictory and our intention is then to contribute to a fuller characterisation of the frontiers between the two;
even when feasible, the richer models may lack a full formal characterisation (this is for instance the case of hypergraphs) and our intention is then to contribute to such a characterisation;
at the question level, by providing clear formalisations of those that will be raised by our biological concerns;
at the answer level:
to extend the area of application of exact algorithms by: (i) a better exploration of the combinatorial properties of the models, (ii) the development of more efficient data structures, (iii) a smarter traversal of the space of solutions when more than one solution exists;
when exact algorithms are not possible, or when there is uncertainty in the input data to an algorithm, to improve the quality of the results given by a deeper exploration of the links between different algorithmic approaches: combinatorial, randomised, stochastic.
The goals of the team are biological and methodological, the two being intrinsically linked. Any division into axes along one or the other aspect or a combination of both is thus somewhat artificial. Following the evaluation of the team at the end of 2017, four main axes were identified, with the last one being the more recently added one. This axis is specifically oriented towards health in general, human or animal. The first three axes are: genomics, metabolism and post-transcriptional regulation, and (co)evolution.
Notice that the division itself is based on the biological level (genomic, metabolic/regulatory, evolutionary) or main current Life Science purpose (health) rather than on the mathematical or computational methodology involved. Any choice has its part of arbitrariness. Through the one we made, we wished to emphasise the fact that the area of application of ERABLE is important for us. It does not mean that the mathematical and computational objectives are not equally important, but only that those are, most often, motivated by problems coming from or associated to the general Life Science goal. Notice that such arbitrariness also means that some Life Science topics will be artificially split into two different Axes. One example of this is genomics and the main health areas currently addressed that are intrinsically inter-related.
Axis 1: Genomics
Intra and inter-cellular interactions involve molecular elements whose identification is crucial to understand what governs, and also what might enable to control such interactions. For the sake of clarity, the elements may be classified in two main classes, one corresponding to the elements that allow the interactions to happen by moving around or across the cells, and another that are the genomic regions where contact is established. Examples of the first are non coding RNAs, proteins, and mobile genetic elements such as (DNA) transposons, retro-transposons, insertion sequences, etc. Examples of the second are DNA/RNA/protein binding sites and targets. Furthermore, both types (effectors and targets) are subject to variation across individuals of a population, or even within a single (diploid) individual. Identification of these variations is yet another topic that we wish to cover. Variations are understood in the broad sense and cover single nucleotide polymorphisms (SNPs), copy-number variants (CNVs), repeats other than mobile elements, genomic rearrangements (deletions, duplications, insertions, inversions, translocations) and alternative splicings (ASs). All three classes of identification problems (effectors, targets, variations) may be put under the general umbrella of genomic functional annotation.
Axis 2: Metabolism and post-transcriptional regulation
As increasingly more data about the interaction of molecular elements (among which those described above) becomes available, these should then be modelled in a subsequent step in the form of networks. This raises two main classes of problems. The first is to accurately infer such networks. Assuming such a network, integrated or “simple”, has been inferred for a given organism or set of organisms, the second problem is then to develop the appropriate mathematical models and methods to extract further biological information from such networks.
The team has so far concentrated its efforts on two main aspects concerning such interactions: metabolism and post-transcriptional regulation by small RNAs. The more special niche we have been exploring in relation to metabolism concerns the fact that the latter may be seen as an organism's immediate window into its environment. Finely understanding how species communicate through those windows, or what impact they may have on each other through them is thus important when the ultimate goal is to be able to model communities of organisms, for understanding them and possibly, on a longer term, for control. While such communication has been explored in a number of papers, most do so at a too high level or only considered couples of interacting organisms, not larger communities. The idea of investigating consortia, and in the case of synthetic biology, of using them, has thus started being developed in the last decade only, and was motivated by the fact that such consortia may perform more complicated functions than could single populations, as well as be more robust to environmental fluctuations. Another originality of the work that the team has been doing in the last decade has also been to fully explore the combinatorial aspects of the structures used (graphs or directed hypergraphs) and of the associated algorithms. As concerns post-transcriptional regulation, the team has essentially been exploring the idea that small RNAs may have an important role in the dialog between different species.
Axis 3: (Co)Evolution
Understanding how species that live in a close relationship with others may (co)evolve requires understanding for how long symbiotic relationships are maintained or how they change through time. This may have deep implications in some cases also for understanding how to control such relationships, which may be a way of controlling the impact of symbionts on the host, or the impact of the host on the symbionts and on the environment (by acting on its symbiotic partner(s)). These relationships, also called symbiotic associations, have however not yet been very widely studied, at least not at a large scale.
One of the problems is getting the data, meaning the trees for hosts and symbionts but even prior to that, determining with which symbionts the present-day hosts are associated (or are “infected” by as may be the term used in some contexts) which is a big enterprise in itself. The other problem is measuring the stability of the association. This has generally been done by concomitantly studying the phylogenies of hosts and symbionts, that is by doing what is called a cophylogeny analysis, which itself is often realised by performing what is called a reconciliation of two phylogenetic trees (in theory, it could be more than two but this is a problem that has not yet been addressed by the team), one for the symbionts and one for the hosts with which the symbionts are associated. This consists in mapping one of the trees (usually, the symbiont tree) to the other. Cophylogeny inherits all the difficulties of phylogeny, among which the fact that it is not possible to check the result against the “truth” as this is now lost in the past. Cophylogeny however also brings new problems of its own which are to estimate the frequency of the different types of events that could lead to discrepant evolutionary histories, and to estimate the duration of the associations such events may create.
Axis 4: Human, animal and plant health
As indicated above, this is a recent axis in the team and concerns various applications to human and animal health. In some ways, it overlaps with the three previous axes as well as with Axis 5 on the methodological aspects, but since it gained more importance in the past few years, we decided to develop more these particular applications. Most of them started through collaborations with clinicians. Such applications are currently focused on three different topics: (i) Infectiology, (ii) Rare diseases, and (iii) Cancer.
Infectiology is the oldest one. It started by a collaboration with Arnaldo Zaha from the Federal University of Rio Grande do Sul in Brazil that focused on pathogenic bacteria living inside the respiratory tract of swines. Since our participation in the H2020 ITN MicroWine, we started interested in infections affecting plants this time, and more particularly vine plants. Rare Diseases on the other hand started by a collaboration with clinicians from the Centre de Recherche en Neurosciences of Lyon (CNRL) and is focused on the Taybi-Linder Syndrome (TALS) and on abnormal splicing of U12 introns, while Cancer rests on a collaboration with the Centre Léon Bérard (CLB) and Centre de Recherche en Cancérologie of Lyon (CRCL) which is focused on Breast and Prostate carcinomas and Gynaecological carcinosarcomas.
The latter collaboration was initiated through a relationship between a member of ERABLE (Alain Viari) and Dr. Gilles Thomas who had been friends since many years. G. Thomas was one of the pioneers of Cancer Genomics in France. After his death in 2014, Alain Viari took the (part time) responsibility of his team at CLB and pursued the main projects he had started.
Within Inria and beyond, the first two applications (Infectiology and Rare Diseases) may be seen as unique because of their specific focus (resp. respiratory tract of swines / vine plants on one hand, and TALS on the other). In the first case, such uniqueness is also related to the fact that the work done involves a strong computational part but also experiments performed within ERABLE itself.
The main areas of application of ERABLE are: (1) biology understood in its more general sense, with a special focus on symbiosis and on intracellular interactions, and (2) health with a special emphasis for now on infectious diseases, rare diseases, and cancer.
Keywords: Bioinformatics - Genomics
Functional Description: The C3Part / Isofun package implements a generic approach to the local alignment of two or more graphs representing biological data, such as genomes, metabolic pathways or protein-protein interactions, in order to infer a functional coupling between them.
Participants: Alain Viari, Anne Morgat, Frédéric Boyer, Marie-France Sagot and Yves-Pol Deniélou
Contact: Alain Viari
URL: http://
Keywords: Bioinformatics - Genomics
Functional Description: Implements methods for the precise detection of genomic rearrangement breakpoints.
Participants: Christian Baudet, Christian Gautier, Claire Lemaitre, Eric Tannier and Marie-France Sagot
Contact: Marie-France Sagot
CO-evolution Assessment by a Likelihood-free Approach
Keywords: Bioinformatics - Evolution
Functional Description:Coala stands for “COevolution Assessment by a Likelihood-free Approach”. It is thus a likelihood-free method for the co-phylogeny reconstruction problem which is based on an Approximate Bayesian Computation (ABC) approach.
Participants: Beatrice Donati, Blerina Sinaimeri, Catherine Matias, Christian Baudet, Christian Gautier, Marie-France Sagot and Pierluigi Crescenzi
Contact: Blerina Sinaimeri
Keywords: Genomics - Algorithm
Functional Description: Given two sequences
Contact: Nadia Pisanti
Keywords: Systems Biology - Bioinformatics
Functional Description: Annotation database system to ease the development and update of enriched
BIOCYC databases. CYCADS allows the integration of the latest sequence
information and functional annotation data from various methods into a metabolic
network reconstruction. Functionalities will be added in future to automate a bridge
to metabolic network analysis tools, such as METEXPLORE. CYCADS was used to
produce a collection of more than 22 arthropod metabolism databases, available at
ACYPICYC (http://
Participants: Augusto Vellozo, Hubert Charles, Marie-France Sagot and Stefano Colella
Contact: Hubert Charles
Keywords: Graph algorithmics - Genomics
Functional Description:DBGWAS is a tool for quick and efficient bacterial GWAS. It uses a compacted De Bruijn Graph (cDBG) structure to represent the variability within all bacterial genome assemblies given as input. Then cDBG nodes are tested for association with a phenotype of interest and the resulting associated nodes are then re-mapped on the cDBG. The output of DBGWAS consists of regions of the cDBG around statistically significant nodes with several informations related to the phenotypes, offering a representation helping in the interpretation. The output can be viewed with any modern web browser, and thus easily shared.
Contact: Leandro Ishi Soares de Lima
Keywords: Bioinformatics - Evolution
Functional Description:Eucalypt stands for “EnUmerator of Coevolutionary Associations in PoLYnomial-Time delay”. It is an algorithm for enumerating all optimal (possibly time-unfeasible) mappings of a symbiont tree unto a host tree.
Participants: Beatrice Donati, Blerina Sinaimeri, Christian Baudet, Marie-France Sagot and Pierluigi Crescenzi
Contact: Blerina Sinaimeri
Keywords: Genomics - Algorithm - NGS
Functional Description:Fast-SG enables the optimal hybrid assembly of large genomes by combining short and long read technologies.
Participants: Alex Di Genova, Marie-France Sagot, Alejandro Maass and Gonzalo Ruz Heredia
Contact: Alex Di Genova
Keywords: Bioinformatics - Graph algorithmics - Systems Biology
Functional Description: Designed to solve the metabolic stories problem, which consists in finding all maximal directed acyclic subgraphs of a directed graph
Participants: Etienne Birmelé, Fabien Jourdan, Ludovic Cottret, Marie-France Sagot, Paulo Vieira Milreu, Pierluigi Crescenzi, Vicente Acuna Aguayo and Vincent Lacroix
Contact: Marie-France Sagot
Keywords: Bioinformatics - Genomics
Functional Description: A fast and memory-efficient DP approach for haplotype assembly from long reads that works until 25x coverage and solves a constrained minimum error correction problem exactly.
Contact: Nadia Pisanti
HyperGraph Library
Keywords: Graph algorithmics - Hypergraphs
Functional Description: The open-source library hglib is dedicated to model hypergraphs, which are a generalisation of graphs. In an *undirected* hypergraph, an hyperedge contains any number of vertices. A *directed* hypergraph has hyperarcs which connect several tail and head vertices. This library, which is written in C++, allows to associate user defined properties to vertices, to hyperedges/hyperarcs and to the hypergraph itself. It can thus be used for a wide range of problems arising in operations research, computer science, and computational biology.
Release Functional Description: Initial version
Participants: Martin Wannagat, David P. Parsons, Arnaud Mary and Irene Ziska
Contact: Arnaud Mary
Keywords: Bioinformatics - NGS
Functional Description:KissDE is an R Package enabling to test if a variant (genomic variant or splice variant) is enriched in a condition. It takes as input a table of read counts obtained from an NGS data pre-processing and gives as output a list of condition-specific variants.
Release Functional Description: This new version improved the recall and made more precise the size of the effect computation.
Participants: Camille Marchet, Aurélie Siberchicot, Audric Cologne, Clara Benoît-Pilven, Janice Kielbassa, Lilia Brinza and Vincent Lacroix
Contact: Vincent Lacroix
Keywords: Bioinformatics - Bioinfirmatics search sequence - Genomics - NGS
Functional Description: Enables to analyse RNA-seq data with or without a reference genome. It is an exact local transcriptome assembler, which can identify SNPs, indels and alternative splicing events. It can deal with an arbitrary number of biological conditions, and will quantify each variant in each condition.
Release Functional Description: Improvements : The KissReads module has been modified and sped up, with a significant impact on run times. Parameters : –timeout default now at 10000: in big datasets, recall can be increased while run time is a bit longer. Bugs fixed : –Reads containing only 'N': the graph construction was stopped if the file contained a read composed only of 'N's. This is was a silence bug, no error message was produced. –Problems compiling with new versions of MAC OSX (10.8+): KisSplice is now compiling with the new default C++ compiler of OSX 10.8+.
KisSplice was applied to a new application field, virology, through a collaboration with the group of Nadia Naffakh at Institut Pasteur. The goal is to understand how a virus (in this case influenza) manipulates the splicing of its host. This led to new developments in KisSplice. Taking into account the strandedness of the reads was required, in order not to mis-interpret transcriptional readthrough. We now use bcalm instead of dbg-v4 for the de Bruijn graph construction and this led to major improvements in memory and time requirements of the pipeline. We still cannot scale to very large datasets like in cancer, the time limiting step being the quantification of bubbles.
Participants: Alice Julien-Laferrière, Leandro Ishi Soares de Lima, Vincent Miele, Rayan Chikhi, Pierre Peterlongo, Camille Marchet, Gustavo Akio Tominaga Sacomoto, Marie-France Sagot and Vincent Lacroix
Contact: Vincent Lacroix
Keywords: Bioinformatics - NGS - Transcriptomics
Functional Description:KisSplice identifies variations in RNA-seq data, without a reference genome. In many applications however, a reference genome is available. KisSplice2RefGenome enables to facilitate the interpretation of the results of KisSplice after mapping them to a reference genome.
Participants: Audric Cologne, Camille Marchet, Camille Sessegolo, Alice Julien-Laferrière and Vincent Lacroix
Contact: Vincent Lacroix
Keywords: Bioinformatics - NGS - Transcriptomics
Functional Description:KisSplice2RefTranscriptome enables to combine the output of KisSplice with the output of a full length transcriptome assembler, thus allowing to predict a functional impact for the positioned SNPs, and to intersect these results with condition-specific SNPs. Overall, starting from RNA-seq data only, we obtain a list of condition-specific SNPs stratified by functional impact.
Participants: Helene Lopez Maestre, Mathilde Boutigny and Vincent Lacroix
Contact: Vincent Lacroix
Keywords: Systems Biology - Bioinformatics
Functional Description: Web-server that allows to build, curate and analyse genome-scale metabolic networks. MetExplore is also able to deal with data from metabolomics experiments by mapping a list of masses or identifiers onto filtered metabolic networks. Finally, it proposes several functions to perform Flux Balance Analysis (FBA). The web-server is mature, it was developed in PHP, JAVA, Javascript and Mysql. MetExplore was started under another name during Ludovic Cottret's PhD in Bamboo, and is now maintained by the MetExplore group at the Inra of Toulouse.
Participants: Fabien Jourdan, Hubert Charles, Ludovic Cottret and Marie-France Sagot
Contact: Fabien Jourdan
Keywords: Bioinformatics - Computational biology - Genomics - Structural Biology
Functional Description: Predicts, at a genome-wide scale, microRNA candidates.
Participants: Christian Gautier, Christine Gaspin, Cyril Fournier, Marie-France Sagot and Susan Higashi
Contact: Marie-France Sagot
Multi-Objective Metabolic mixed integer Optimization
Keywords: Metabolism - Metabolic networks - Multi-objective optimisation
Functional Description:Momo is a multi-objective mixed integer optimisation approach for enumerating knockout reactions leading to the overproduction and/or inhibition of specific compounds in a metabolic network.
Participants: Ricardo Luiz de Andrade Abrantes, Nuno Mira, Susana Vinga and Marie-France Sagot
Contact: Marie-France Sagot
Mathematical explOration of Omics data on a MetabolIc Network
Keywords: Metabolic networks - Transcriptomics
Functional Description:Moomin is a tool for analysing differential expression data. It takes as its input a metabolic network and the results of a DE analysis: a posterior probability of differential expression and a (logarithm of a) fold change for a list of genes. It then forms a hypothesis of a metabolic shift, determining for each reaction its status as "increased flux", "decreased flux", or "no change". These are expressed as colours: red for an increase, blue for a decrease, and grey for no change. See the paper for full details: https://
Participants: Henri Taneli Pusa, Mariana Ferrarini, Ricardo Luiz de Andrade Abrantes, Arnaud Mary, Alberto Marchetti-Spaccamela, Leen Stougie and Marie-France Sagot
Contact: Marie-France Sagot
Keywords: Systems Biology - Algorithm - Graph algorithmics - Metabolic networks - Computational biology
Functional Description:MultiPus (for “MULTIple species for the synthetic Production of Useful biochemical Substances”) is an algorithm that, given a microbial consortium as input, identifies all optimal sub-consortia to synthetically produce compounds that are either exogenous to it, or are endogenous but where interaction among the species in the sub-consortia could improve the production line.
Participants: Alberto Marchetti-Spaccamela, Alice Julien-Laferrière, Arnaud Mary, Delphine Parrot, Laurent Bulteau, Leen Stougie, Marie-France Sagot and Susana Vinga
Contact: Marie-France Sagot
Keywords: Bioinformatics - Graph algorithmics - Systems Biology
Functional Description: The algorithms in Pitufolandia (Pitufo / Pitufina / PapaPitufo) are designed to solve the minimal precursor set problem, which consists in finding all minimal sets of precursors (usually, nutrients) in a metabolic network that are able to produce a set of target metabolites.
Participants: Vicente Acuna Aguayo, Paulo Vieira Milreu, Alberto Marchetti-Spaccamela, Leen Stougie, Martin Wannagat and Marie-France Sagot
Contact: Marie-France Sagot
Keywords: Bioinformatics - Graph algorithmics - Systems Biology
Functional Description:Sasita is a software for the exhaustive enumeration of minimal precursor sets in metabolic networks.
Participants: Vicente Acuna Aguayo, Ricardo Luiz de Andrade Abrantes, Paulo Vieira Milreu, Alberto Marchetti-Spaccamela, Leen Stougie, Martin Wannagat and Marie-France Sagot
Contact: Marie-France Sagot
Keywords: Algorithm - Genomics
Functional Description: Reconstruction of viral quasi species without using a reference genome.
Contact: Alexander Schonhuth
Keywords: Bioinformatics - Genomic sequence
Functional Description: Motif inference algorithm taking as input a set of biological sequences.
Participant: Marie-France Sagot
Contact: Marie-France Sagot
Keywords: Bioinformatics - Genomics - Sequence alignment
Functional Description: Detects long similar fragments occurring at least twice in a set of biological sequences.
Participants: Nadia Pisanti and Marie-France Sagot
Contact: Nadia Pisanti
Keywords: Bioinformatics - Graph algorithmics - Systems Biology
Functional Description: Both Totoro and Kotoura decipher the reaction changes during a metabolic transient state, using measurements of metabolic concentrations. These are called metabolic hyperstories.
Totoro (for TOpological analysis of Transient metabOlic RespOnse) is based on a qualitative measurement of the concentrations in two steady-states to infer the reaction changes that lead to the observed differences in metabolite pools in both conditions. In the currently available release, a pre-processing and a post-processing steps are included. After the post-processing step, the solutions can be visualised using Dinghy (http://
Participants: Alice Julien-Laferrière, Ricardo Luiz de Andrade Abrantes, Arnaud Mary, Mariana Ferrarini, Susana Vinga, Irene Ziska and Marie-France Sagot
Contact: Marie-France Sagot
Viral haplotype reconstruction from contigs using variation graphs
Keyword: Haplotyping
Functional Description: The goal of haplotype-aware genome assembly is to reconstruct all individual haplotypes from a mixed sample and to provide corresponding abundance estimates. VG-flow provides a reference-genome-independent solution based on the construction of a variation graph, capturing all quasispecies diversity present in the sample. We solve the contig abundance estimation problem and propose a greedy algorithm to efficiently build full-length haplotypes. Finally, we obtain accurate frequency estimates for the reconstructed haplotypes through linear programming techniques.
Contact: Alexander Schonhuth
Viral haplotype reconstruction from contigs using variation graphs
Keyword: Haplotyping
Functional Description: Viruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly refers to reconstructing the strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains, an important step for various treatment-related reasons. Reference-genome-independent (de novo) approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. Virus-VG aims to reconstruct full-length haplotypes together with their abundances from such contigs, represented as a genome variation graph.
Contact: Alexander Schonhuth
Making the path
Keyword: Genome assembly
Functional Description:Wengan is a new genome assembler that unlike most of the current long-reads assemblers avoids entirely the all-vs-all read comparison. The key idea behind Wengan is that long-read alignments can be inferred by building paths on a sequence graph. To achieve this, Wengan builds a new sequence graph called the Synthetic Scaffolding Graph. The SSG is built from a spectrum of synthetic mate-pair libraries extracted from raw long-reads. Longer alignments are then built by performing a transitive reduction of the edges. Another distinct feature of Wengan is that it performs self-validation by following the read information. Wengan identifies miss-assemblies at differents steps of the assembly process.
Participants: Alex Di Genova and Marie-France Sagot
Contact: Marie-France Sagot
Keywords: Bioinformatics - Genomics
Functional Description:WhatsHap is a DP approach for haplotype assembly from long reads that works until 20x coverage and solves the minimum error correction problem exactly. pWhatsHap is a parallelisation of the core dynamic programming algorithm of WhatsHap.
Contact: Nadia Pisanti
We present in this section the main results obtained in 2019.
We tried to organise these along the four axes as presented above. Clearly, in some cases, a result obtained overlaps more than one axis. In such case, we chose the one that could be seen as the main one concerned by such results.
We chose not to detail here the results on more theoretical aspects of computer science when these are initially addressed in contexts not directly related to computational biology even though those on string , , , , , and graph algorithms in general , , , , are relevant for life sciences, such as for instance pan-genome analysis, or could become more specifically so in a near future. One important example of the latter concerns enumeration algorithms that has always been at the heart of the computer science and mathematics interests of the team. In such context, the so-called reconfiguration problem which asks whether one solution can be transformed into the other in a step-by-step fashion such that each intermediate solution is also feasible is of particular relevance. This was explored in the context of a perfect matching problem .
A few other results of 2019 are not mentioned in this report, not because the corresponding work is not important, but because it was likewise more specialised , , , . In the same way, also for space reasons, we chose not to detail the results presented in some biological papers of the team when these did not require a mathematical or algorithmic input , .
On the other hand, we do mention a couple of works that were in preparation or about to be submitted towards the end of 2018.
Transcriptome profiling using Nanopore sequencing Our vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. In , we generated a full mouse transcriptome from brain and liver using such Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed in that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further showed that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T’s. This bias is marked for runs of at least 15 T's, but is already detectable for runs of at least 9 T's and therefore concerns more than 20% of the expressed transcripts in mouse brain and liver. Finally, we outlined that bioinformatic challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show in the paper that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene.
Genotyping and variant detection The amount of genetic variation discovered and characterised in human populations is huge, and is growing rapidly with the widespread availability of modern sequencing technologies. Such a great deal of variation data, that accounts for human diversity, leads to various challenging computational tasks, including variant calling and genotyping of newly sequenced individuals. The standard pipelines for addressing these problems include read mapping, which is a computationally expensive procedure. A few mapping-free tools were proposed in recent years to speed up the genotyping process. While such tools have highly efficient run-times, they focus on isolated, bi-allelic SNPs, providing limited support for multi-allelic SNPs, indels, and genomic regions with high variant density. To address these issues, we introduced Malva, a fast and lightweight mapping-free method to genotype an individual directly from a sample of reads . Malva is the first mapping-free tool that is able to genotype multi-allelic SNPs and indels, even in high density genomic regions, and to effectively handle a huge number of variants such as those provided by the 1000 Genome Project. An experimental evaluation on whole-genome data shows that Malva requires one order of magnitude less time to genotype a donor than alignment-based pipelines, providing similar accuracy. Remarkably, on indels, Malva provides even better results than the most widely adopted variant discovery tools.
Still on the issue of SNP detection, in , we developed the positional clustering theory that (i) describes how the extended Burrows–Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position, (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the eBWT. Based on this theory, we designed and implemented an alignment-free and reference-free SNP calling method, and we devised a SNP calling pipeline. Experiments on both synthetic and real data show that SNPs can be detected with a simple scan of the eBWT and LCP arrays as, in agreement with our theoretical framework, they are within clusters in the eBWT of the reads. Finally, our tool intrinsically performs a reference-free evaluation of its accuracy by returning the coverage of each SNP. Based on the results of the experiments on synthetic and real data, we conclude that the positional clustering framework can be effectively used for the problem of identifying SNPs, and it appears to be a promising approach for calling other types of variants directly on raw sequencing data.
Finally, variant detection and various related algorithmic problems were extensively explored in the PhD of Leandro I. S. de Lima defended in April 2019.
Bubble generator
Bubbles are pairs of internally vertex-disjoint
Genome assembly
The continuous improvement of long-read sequencing technologies along with the development of ad-doc algorithms has launched a new de novo assembly era that promises high-quality genomes. However, it has proven difficult to use only long reads to generate accurate genome assemblies of large, repeat-rich human genomes. To date, most of the human genomes assembled from long error-prone reads add accurate short reads to further improve the consensus quality (polishing). In a paper to be submitted before the end of 2019 (with as main authors A. di Genova and M.-F. Sagot), we report the development of an algorithm for hybrid assembly, Wengan, and its application to hybrid sequence datasets from four human samples. Wengan implements efficient algorithms that exploit the sequence information of short and long reads to tackle assembly contiguity as well as consensus quality. We show that the resulting genome assemblies have high contiguity (contig NG50:16.67-62.06 Mb), few assembly errors (contig NGA50:10.9-45.91 Mb), good consensus quality (QV:27.79-33.61), high gene completeness (BUSCO complete: 94.6-95.1%), and consume few computational resources (CPU hours:153-1027). In particular, the Wengan assembly of the haploid CHM13 sample achieved a contig NG50 of 62.06 Mb (NGA50:45.91 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50:57.88 Mb). Because of its lower cost, Wengan is an important step towards the democratisation of the de novo assembly of human genomes. Wengan is available at
https://
On assembly still, although haplotype-aware genome assembly plays an important role in genetics, medicine and various other disciplines, the generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects the fact that the methodology for reference independent haplotig computation has not yet reached maturity. We presented in a new approach, called POLYploid genome fitTEr (Polyte) for a de novo generation of haplotigs for diploid and polyploid genomes of known ploidy. Our method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph. Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that Polyte establishes new standards in terms of error-free reconstruction of haplotype-specific sequences. As a consequence, Polyte outperforms state-of-the-art approaches in various relevant aspects, notably in polyploid settings.
Others Besides the above, we have also explored a proteogenomics workflow for the expert annotation of eukaryotic genomes , as well as a technology- and species-independent simulator of sequencing data and genomic variants .
Multi-objective metabolic mixed integer optimisation with an application to yeast strain engineering
In a paper submitted and already available in bioRxiv (https://
Metabolic shifts
Analysis of differential expression of genes is often performed to understand how the metabolic activity of an organism is impacted by a perturbation. However, because the system of metabolic regulation is complex and all changes are not directly reflected in the expression levels, interpreting these data can be difficult.
In , we presented a new algorithm and computational tool that uses a genome-scale metabolic reconstruction to infer metabolic changes from differential expression data. Using the framework of constraint-based analysis, our method produces a qualitative hypothesis of a change in metabolic activity. In other words, each reaction of the network is inferred to have increased, decreased, or remained unchanged in flux. In contrast to similar previous approaches, our method does not require a biological objective function and does not assign on/off activity states to genes. An implementation is provided and is available online at the address https://
Metabolic games Game theory is a branch of applied mathematics originally developed to describe and reason about situations where two or more rational agents, the “homo economicus”, are faced with choices and have potentially conflicting goals. All participants want to maximise their own well-being, but are doing so taking into account that everyone else is doing the same. Thus paradoxical, suboptimal, outcomes are possible and even common. Evolutionary game theory was born out of the realisation that rational choice can be replaced by natural selection: in the course of evolution the strategy (phenotype) that would “win” the game would prevail by simply proliferating more successfully thanks to its success in the “game”. It turns out that phenotype prediction in the context of metabolic networks is exactly the type of problem that evolutionary game theory was meant to answer: given a set of choices (as defined by a metabolic network reconstruction), what will be the actual metabolism observed? In other words, if we culture a set of organisms together in a given medium, which are the phenotype(s) that emerge as winners? In , we sought to provide a short introduction to both evolutionary game theory and its use in the context of metabolic modelling. This work was also part of the PhD of Taneli Pusa .
Modelling invasion Nowadays, the most used model in studies of the coevolution of hosts and symbionts is phylogenetic tree reconciliation. A crucial issue in this model is that from a biological point of view, reasonable cost values for an event-based reconciliation are not easily chosen. Different methods have been developed to infer such cost values for a given pair of host and symbiont trees, including one we established in the past. However, a major limitation of these methods is their inability to model the “invasion” of different host species by a same symbiont species (referred to as a spread event), which is often observed in symbiotic relations. Indeed, many symbionts are generalist. For instance, the same species of insect may pollinate different species of plants. In a paper currently in preparation, we propose a method, called AmoCoala, which for a given pair of host and symbiont trees, estimates the frequency of the cophylogenetic events, in presence of spread events, based on an approximate Bayesian computation (ABC) approach that may be more efficient than a classical likelihood method. The algorithm that we propose on one hand provides more confidence in the set of costs to be used for a given pair of host and symbiont trees, while on the other hand, it allows to estimate the frequency of the events even in the case of large datasets. We evaluated our method on both synthetic and real datasets.
Co-divergence and tree topology
In reconstructing the common evolutionary history of hosts and symbionts, the current method of choice is the phylogenetic tree reconciliation. In this model, we are given a host tree
Rare disease studies Minor intron splicing plays a central role in human embryonic development and survival. Indeed, biallelic mutations in RNU4ATAC, transcribed into the minor spliceosomal U4atac snRNA, are responsible for three rare autosomal recessive multimalformation disorders named Taybi-Linder (TALS/MOPD1), Roifman (RFMN), and Lowry-Wood (LWS) syndromes, which associate numerous overlapping signs of varying severity. Although RNA-seq experiments have been conducted on a few RFMN patient cells, none have been performed in TALS, and more generally no in-depth transcriptomic analysis of the 700 human genes containing a minor (U12-type) intron had been published as yet. We thus sequenced RNA from cells derived from five skin, three amniotic fluid, and one blood biosamples obtained from seven unrelated TALS cases and from age- and sex-matched controls. This allowed us to describe for the first time the mRNA expression and splicing profile of genes containing U12-type introns, in the context of a functional minor spliceosome. Concerning RNU4ATAC-mutated patients, we showed in that as expected, they display distinct U12-type intron splicing profiles compared to controls, but that rather unexpectedly the mRNA expression levels are mostly unchanged. Furthermore, although U12-type intron missplicing concerns most of the expressed U12 genes, the level of U12-type intron retention is surprisingly low in fibroblasts and amniocytes, and much more pronounced in blood cells. Interestingly, we found several occurrences of introns that can be spliced using either U2, U12, or a combination of both types of splice site consensus sequences, with a shift towards splicing using preferentially U2 sites in TALS patients' cells compared to controls.
This work is part of the PhD of Audric Cologne defended in October 2019.
Cancer studies Circular RNAs (circRNAs) are a class of RNAs that is under increasing scrutiny, although their functional roles are debated. In , we analysed RNA-seq data of 348 primary breast cancers and developed a method to identify circRNAs that does not rely on unmapped reads or known splice junctions. We identified 95,843 circRNAs, of which 20,441 were found recurrently. Of the circRNAs that match exon boundaries of the same gene, 668 showed a poor or even negative (R < 0.2) correlation with the expression level of the linear gene. An In silico analysis showed that only a minority (8.5%) of circRNAs could be explained by known splicing events. Both these observations suggest that specific regulatory processes for circRNAs exist. We confirmed the presence of circRNAs of CNOT2, CREBBP, and RERE in an independent pool of primary breast cancers. We identified circRNA profiles associated with subgroups of breast cancers and with biological and clinical features, such as amount of tumour lymphocytic infiltrate and proliferation index. siRNA-mediated knockdown of circCNOT2 was shown to significantly reduce viability of the breast cancer cell lines MCF-7 and BT-474, further underlining the biological relevance of circRNAs. Furthermore, we found that circular, and not linear, CNOT2 levels are predictive for progression-free survival time to aromatase inhibitor (AI) therapy in advanced breast cancer patients, and found that circCNOT2 is detectable in cell-free RNA from plasma. We showed that circRNAs are abundantly present, show characteristics of being specifically regulated, are associated with clinical and biological properties, and thus are relevant in breast cancer.
Other cancer studies have concerned the automatic discovery of the 100-miRNA signature for cancer classification , an Integrative and comparative genomic analysis to identify clinically relevant pulmonary carcinoid groups and unveil the supra-carcinoids , [complete with 2 papers not yet entered in Hal], and finally the investigation of new therapeutic interventions that are needed to increase the immunogenicity of tumours and overcome the resistance to these immuno-therapies .
Infection studies Mycoplasma hyopneumoniae is an economically devastating pathogen in the pig farming industry, however little is known about its relation with the swine host. To improve our understanding on this interaction, we infected epithelial cells with M. hyopneumoniae to identify the effects of the infection on the expression of swine genes and miRNAs. In addition, we identified miRNAs differentially expressed (DE) in the extracellular milieu and in exosome-like vesicles released by infected cells. A total of 1,268 genes and 170 miRNAs were DE post-infection (p<0.05). We identified the up-regulation of genes related to redox homeostasis and antioxidant defense, most of them putatively regulated by the transcription factor NRF2. Down-regulated genes were enriched in cytoskeleton and ciliary function, which could partially explain M. hyopneumoniae induced ciliostasis. Our predictions showed that DE miRNAs could be regulating the aforementioned functions, since we detected down-regulation of miRNAs predicted to target antioxidant genes and up-regulation of miRNAs targeting ciliary and cytoskeleton genes. Based on these observations, M. hyopneumoniae seems to elicit an antioxidant response induced by NRF2 in infected cells; in addition, we propose that ciliostasis caused by this pathogen might be related to down-regulation of ciliary genes. The paper presenting these results has been submitted and is in revision.
Others Besides the above, a first step towards deep learning assisted genotype-phenotype association in whole genome-sized data has been explored in the context of predicting amyotrophic lateral sclerosis .
Title: characterization of hoSt-gut microbiota interactions and identification of key Players based on a unified reference for standardized quantitative metagenOmics and metaboliC analysis frameworK
Industrial Partner: MaatPharma (Person responsible: Lilia Boucinha).
ERABLE participants: Marie-France Sagot (ERABLE coordinator and PhD main supervisor with Susana Vinga from IST, Lisbon, Portugal, as PhD co-supervisor), Marianne Borderes (beneficiary of the PhD scholarship in MaatPharma).
Type: ANR Technology (2018-2021).
Web page: http://
Title: Multi-Omics and Metabolic models iNtegration to study growth Transition in Escherichia coli
Coordinators: Delphine Ropers (EPI Ibis) and Marie-France Sagot
ERABLE participants: Marie-France Sagot and Arnaud Mary.
Type: IXXI Project (2018-2020).
Web page: none for now.
Title: Algorithms and Software for Third gEneration Rna sequencing
Coordinator: Hélène Touzet, University of Lille and CNRS.
ERABLE participants: Vincent Lacroix (ERABLE coordinator), Audric Cologne, Eric Cumunel, Alex di Genova, Leandro I. S. de Lima, Arnaud Mary, Marie-France Sagot, Camille Sessegolo, Blerina Sinaimeri.
Type: ANR (2016-2020).
Web page: http://
Title: Enumération dans les graphes et les hypergraphes : Algorithmes et complexité
Coordinator: D. Kratsch
ERABLE participant(s): A. Mary
Type: ANR (2015-2019)
Web page: http://
Title: Graph Reconfiguration
Coordinator: N. Bousquet
ERABLE participant(s): A. Mary
Type: ANR JCJC (2019-2021)
Web page: Not available
Title: Deciphering host immune gene regulation and function to target symbiosis disturbance and endosymbiont control in insect pests
Coordinator: A. Heddi
ERABLE participant(s): M.-F. Sagot, C. Vieira
Type: ANR (2018-2021)
Web page: Not yet available
Title: Host-microbiota co-adaptations: mechanisms and consequences
Coordinator: F. Vavre
ERABLE participant(s): F. Vavre
Type: ANR PRC (2017-2020)
Web page: Not available
Title: Networks
Coordinator: Michel Mandjes, University of Amsterdam
ERABLE participant(s): S. Pissis, L. Stougie
Type: NWO Gravity Program (2014-2024)
Web page: https://
Title: Rapid Evolution of Symbiotic Interactions in response to STress: processes and mechanisms
Coordinator: N. Kremer
ERABLE participant(s): F. Vavre
Type: ANR JCJC (2017-2020)
Web page: Not available
Title: Worldwide invasion of the Spotted WING Drosophila: Genetics, plasticity and evolutionary potential
Coordinator: P. Gibert
ERABLE participant(s): C. Vieira
Type: ANR PCR (2016-2020)
Web page: Not available
Title: Rôle de l'épissage mineur dans le développement cérébral
Coordinator: Patrick Edery, Centre de Recherche en Neurosciences de Lyon.
ERABLE participants: Vincent Lacroix (ERABLE coordinator), Audric Cologne.
Type: ANR (2018-2021).
Web page: Not available.
Title: Microbial Impact on insect behaviour: from niche and partner selection to the development of new control methods for pests and disease vectors
Coordinator: F. Vavre
ERABLE participant(s): F. Vavre
Type: AO Scientific Breakthrough (2018-2021)
Web page: Not available
Notice that were included here national projects of our members from Italy and the Netherlands when these have no other partners than researchers from the same country.
Title: efficient Algorithms for HArnessing networked Data
Coordinator: G. Italiano
ERABLE participant(s): R. Grossi, G. Italiano
Type: MUIR PRIN, Italian Ministry of Education, University and Research (2019-2022)
Title: Combinatorial Methods for analysis and compression of biological sequences
Coordinator: G. Rosone
ERABLE participant(s): N. Pisanti
Type: SIR, MIUR PRIN, Italian Ministry of Research National Projects (2015-2019)
Title: MyOwnResearch: Homogeneous subgroup identification in fatigue management across chronic immune diseases through single subject research design
Coordinator: A. Schönhuth
ERABLE participant(s): A. Schönhuth
Type: Health Holland project (2018-2021)
Web page: Not available
Title: Open Innovation: Digital Innovation for Driving
Coordinator: G. Italiano
ERABLE participant(s): G. Italiano
Type: Bridgestone (2018-2019)
Web page: Not available
Title: Pan-genome Graph Algorithms and Data Integration
Coordinator: Paola Bonizzoni, University of Milan, Italy
ERABLE participant(s): S. Pissis, A. Schönhuth, L. Stougie
Type: H2020 MSCA-RISE (2020-2022)
Web page: Not available
By itself, ERABLE is built from what initially were collaborations with some major European Organisations (CWI, Sapienza University of Rome, Universities of Florence and Pisa, Free University of Amsterdam) and then became a European Inria Team.
Compasso
Title: COMmunity Perspective in the health sciences: Algorithms and Statistical approacheS for explOring it
Duration: 2018, renewable from 2 to 5 years more
Coordinator: On the Portuguese side, Susana Vinga, IST, Lisbon, Portugal; on the French side, Marie-France Sagot
ERABLE participant(s): R. Andrade, M. Ferrarini, G. Italiano, A. Marchetti-Spaccamela, A. Mary, H. T. Pusa, M.-F. Sagot, B. Sinaimeri, L. Stougie, A. Viari, I. Ziska
Web page: http://
ERABLE is coordinator of a CNRS-UCBL-Inria Laboratoire International Associé (LIA) with the Laboratório Nacional de Computação Científica (LNCC), Petrópolis, Brazil. The LIA has for acronym LIRIO (“Laboratoire International de Recherche en bIOinformatique”) and is coordinated by Ana Tereza Vasconcelos from the LNCC and Marie-France Sagot from BAOBAB-ERABLE. The LIA was created in January 2012 for 4 years, renewable once for 4 more years. This year (2019) is the final one. A web page for the LIA LIRIO is available at this address: http://
Erable also participates in Network for Organismal Interactions Research (NOIR), a project funded by Conicyt in Chile within the call Internation Networking between Research Centers. The project started in 2019 and will last until the end of 2020. The coordinator on the Chilean side is Elena Vida from the Universidad Mayor, Santiago, Chile, and the Erable participants are Carol Moraga Quinteros, Mariana Ferrarini and Marie-France Sagot.
Finally, Marie-France Sagot participates in a Portuguese FCT project, Perseids for “Personalizing cancer therapy through integrated modeling and decision” (2016-2019), with Susana Vinga and a number of other Portuguese researchers. The budget of Perseids is managed exclusively by the Portuguese partner. Perseids ended in December 2019.
In 2019, ERABLE greeted the following International scientists:
In France: Alexandra Carvalho and Susana Vinga, Assistant and Associate professors resp., Instituto Superior Técnico, Lisbon, Portugal; Helisson Faoro, researcher, Instituto Carlos Chagas, Fiocruz, Paraná, Brazil; Ariel Silber, professor, Universidade de São Paulo, Brazil; Arnaldo Zaha, professor at Universidade Federal do Rio Grande do Sul, Brazil.
In Italy: Travis Gaggie, Associate professor, Dalhousie University; Nicola Prezza, postdoc, University of Pisa; Elena Arseneva, Assistant professor, St Petersburg State University, Blerina Sinaimeri, Junior Researcher, Inria (see below); Marie-France Sagot, Senior researcher, Inria (see below).
In the Netherlands: Wiktor Zuba, PhD student, University of Warsaw; Lorraine Ayad, Lecturer, King's College London; Grigorios Loukides, Lecturer, King's College London; Martin Farach-Colton, Professor, Rutgers University; Grigorios Loukides, Lecturer, King's College London; Martin Dyer, Professor, University of Leeds.
In 2019, ERABLE in France greeted the following Internships:
Phablo Moura, postdoc, University of Campinas, Brazil.
Diego Pérez and Evelyn Sanchéz, PhD students of Elena Vidal, Universidad Mayor, Santiago, Chile.
In the Netherlands, ERABLE greeted the following Internships: Luca Denti, University Bicocca of Milano, Italy, from October 2018 to January 2019, Mick van Dijk, TU Delft, from May 2018 to January 2019, Giulia Barnardini, University Bicocca of Milano, Italy, from September 2018 to November 2019.
From July 2019 to June 2020, Blerina Sinaimeri was on Sabbatical at Luiss University to work with Giuseppe Italiano, member of Erable.
In 2019, Marie-France Sagot visited Luiss University for 11 days as Visiting Professor from LUISS University to work with Blerina Sinaimeri who is on Sabbatical at Luiss University from July 2019 to June 2020, and with Giuseppe Italiano, member of Erable. While there, M.-F. Sagot also worked with Alberto Marchetti-Spaccamela from Sapienza University of Rome and from Erable.
Giuseppe Italiano is member of the Steering Committee of the Workshop on Algorithm Engineering and Experimentation (ALENEX), of the International Colloquium on Automata, Languages and Programming (ICALP), and of the Workshop/Symposium on Experimental Algorithms (SEA).
Alberto Marchetti-Spaccamela is a member of the Steering committee of Workshop on Graph Theoretic Concepts in Computer Science (WG)), and of Workshop on Algorithmic Approaches for Transportation Modeling, Optimization, and Systems (ATMOS).
Arnaud Mary is member of the Steering Committee of Workshop on Enumeration Problems and Applications (WEPA).
Marie-France Sagot is member of the Steering Committee of European Conference on Computational Biology (ECCB), International Symposium on Bioinformatics Research and Applications (ISBRA), and Workshop on Enumeration Problems and Applications (WEPA).
Alexander Schönhuth is member of the Steering committee of the Research in Computational Molecular Biology, satellite conference on massively parallel sequencing (RECOMB-seq).
Leen Stougie was co-organiser of MAPSP 2019, Jun 2019, Hotel Zeeuwse Stromen, Renesse; and of the Networks Workshop on Random graphs, counting and sampling, Sep 2019, CWI, Amsterdam.
Giuseppe Italiano was a member of the Program Committee of APF, ATMOS, and CIAC.
Arnaud Mary was a member of the Program Committee of MFCS, and WEPA.
Nadia Pisanti was a member of the Program Committee of BIOINFORMATICS, CPM, ICCS, ISBRA, IWOCA, and WABI.
Marie-France Sagot was a member of the Program Committee of BIBM, CIAC, CPM, PSC, RecombCG, d anWABI.
Members of ERABLE have reviewed papers for a number of workshops and conferences including: CPM, ISMB, RECOMB, WEPA, WABI.
Roberto Grossi is member of the Editorial Board of Theory of Computing Systems (TOCS) and pf RAIRO – Theoretical Informatics and Applications.
Giuseppe Italiano is member of the Editorial Board of Algorithmica and Theoretical Computer Science.
Vincent Lacroix is recommender for Peer Community in Genomics, see https://
Alberto Marchetti-Spaccamela is member of the Editorial Board of Theoretical Computer Science.
Arnaud Mary is Editor-in-Chief of a special issue of Discrete Applied Mathematics dedicated to WEPA 2016.
Nadia Pisanti is since 2012 member of Editorial Board of International Journal of Computer Science and Application (IJCSA) and since 2017 of Network Modeling Analysis in Health Informatics and Bioinformatics.
Marie-France Sagot is member of the Editorial Board of BMC Bioinformatics, Algorithms for Molecular Biology, and Lecture Notes in BioInformatics.
Leen Stougie is member of the Editorial Board of AIMS Journal of Industrial and Management Optimization.
Cristina Vieira is Executive Editor of Gene, and since 2014 member of the Editorial Board of Mobile DNA.
Members of ERABLE have reviewed papers for a number of journals including: Theoretical Computer Science, Algorithmica, Algorithms for Molecular Biology, Bioinformatics, BMC Bioinformatics, Genome Biology, Genome Research, IEEE/ACM Transactions in Computational Biology and Bioinformatics (TCBB), Molecular Biology and Evolution, Nucleic Acid Research.
Giuseppe Italiano: invited talk on “2-Connectivity on Directed Graphs", 14th Computer Science Symposium in Russia (CSR 2019), Novosibirsk, Russia.
Nadia Pisanti: Invited talk on “Mapping Reads on a Pan-Genome: Pattern Matching on Degenerate Texts", 1st Workshop on Computational Pan-Genomics, Bielefeld, Germany; Invited talk on “On-line (approximate) Pattern Matching on Degenerate Texts and Applications", 14th Workshop on Compression, Text and Algorithms (WCTA), Segovia, Spain.
Solon Pissis: Invited talk on “Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication", Algorithms Group Seminar Series, 24 Oct 2019, University of Warsaw, Warsaw, Poland; Invited talk on “When linear space is impractical: computing absent words in output-sensitive space", Bonsai Bioinformatics Seminar Series, 11 Jun 2019, Université de Lille, Lille, France; Invited talk on “Elastic-degenerate strings: a new representation for pattern matching in a collection of similar texts", Computer Science Seminar Series, Feb 12 2019, University of Pisa, Pisa, Italy.
Leen Stougie: Invited talk on “Fixed-Order Scheduling on parallel machines", Workshop on Combinatorial Optimization, 26-27 September 2019, TU Berlin, Germany.
Cristina Vieira: Invited talk on “Contribution of Transposable element to gene expression in Drosophila (and other)", XI Symposium of Ecology, Genetic and Drosophila Evolution, November 2019, Pelotas, Brazil.
Giuseppe F. Italiano is member of the Council of the European Association for Theoretical Computer Science. Leen Stougie is member of the General Board of the Dutch Network on the Mathematics of Operations Research (Landelijk Netwerk Mathematische Besliskunde (LNMB)).
Hubert Charles is director of the Biosciences Department of the Insa-Lyon and co-director of studies of the “Bioinformatique et Modélisation (BIM)” track.
Giuseppe Italiano is member of the Advisory Board of MADALGO - Center for MAssive Data ALGOrithmics, Aarhus, Denmark.
Nadia Pisanti is since November 1st 2017 member of the Board of the PhD School in Data Science (University of Pisa jointly with Scuola Normale Superiore Pisa, Scuola S. Anna Pisa, IMT Lucca).
Marie-France Sagot is member of the Advisory Board of CWI, Amsterdam, the Netherlands, and chair of the CSS for MBIO at Inra.
Alexander Schönhuth is member of the Scientific Board of BioSB (the Dutch organisation for bioinformatics) since May 2017.
Leen Stougie is since April 2017 Leader of the Life Science Group at CWI. He is member of the General Board of the Dutch Network on the Mathematics of Operations Research (Landelijk Netwerk Mathematische Besliskunde (LNMB)), and member of the Management Team of the Gravity project Networks.
Alain Viari is member of a number of scientific advisory boards (IRT (Institut de Recherche Technologique) BioAster; Centre Léon Bérard). He also coordinates together with J.-F. Deleuze (CNRGH-Evry) the Research & Development part (CRefIX) of the “Plan France Médecine Génomique 2025”.
Fabrice Vavre is President of the Section 29 of the CoNRS8.
Cristina Vieira is member of the “Conseil National des Universités” (CNU) 67 (“Biologie des Populations et Écologie”), and since 2017 member of the “Conseil de la Faculté des Sciences et Technologies (FST)” of the University Lyon 1.
The members of ERABLE teach both at the Department of Biology of the University of Lyon (in particular within the BISM (BioInformatics, Statistics and Modelling) specialty, and at the department of Bioinformatics of the Insa (National Institute of Applied Sciences).
Cristina Vieira is responsible for the Master Biodiversity, Ecology and Evolution (https://
The ERABLE team regularly welcomes M1 and M2 interns from the bioinformatics Master.
Vincent Lacroix and Audric Cologne were instructors in the NGS data analysis training for the CNRS Formation, a course coordinated by Annabelle Haudry, LBBE
(https://
All French members of the ERABLE team are affiliated to the doctoral school E2M2 (Ecology-Evolution-Microbiology-Modelling, http://
Italian researchers teach between 90 and 140 hours per year, at both the undergraduate and at the Master levels. The teaching involves pure computer science courses (such as Programming foundations, Programming in C or in Java, Computing Models, Distributed Algorithms) and computational biology (such as Algorithms for Bioinformatics).
Dutch researchers teach between 60 and 100 hours per year, again at the undergraduate and Master levels, in applied mathematics (e.g. Operational Research, Advanced Linear Programming), machine learning (Deep Learning) and computational biology (e.g. Biological Network Analysis, Algorithms for Genomics).
The following PhDs were defended in ERABLE in 2019:
Jasmijn Baaijens, CWI (supervisor: Alexander Schönhuth), Sep 2019
Annelieke Baller, Vrije Universiteit Amsterdam (co-supervisor: Leen Stougie), Nov 2019
Thomas Bosman, Vrije Universiteit Amsterdam (co-supervisor: Leen Stougie), Nov 2019
Audric Cologne, University of Lyon 1 (funded by Inserm and Inria, co-supervisors: Patrick Edery – Federation of Health Research of Lyon-Est, Vincent Lacroix), Oct 2019
Leandro Ishi Soares de Lima, University of Lyon 1 (funded by the Brazilian “Science without Borders” program, co-supervisors: Giuseppe Italiano, Vincent Lacroix, Marie-France Sagot), Apr 2019
Nikos Parotsidis, University of Rome Tor Vergata, supervisor: Giuseppe Italiano, Mar 2019
Henri Taneli Pusa, University of Lyon 1 (funded by H2020-MSCA-ETN-2014 project MicroWine, co-supervisors: Alberto Marchetti-Spaccamela, Arnaud Mary, Marie-France Sagot), Feb 2019
The following are the PhDs in progress:
Marianne Borderes, University Lyon 1 (funded by ANR Technology Spock, co-supervisors: Susana Vinga – Instituto Superior Técnico at Lisbon; Marie-France Sagot)
Nicolas Homberg, Inra, Inria & University of Lyon 1 (funded by Inra & Inria, co-supervisors: Christine Gaspin at Inra; Marie-France Sagot)
Carol Moraga Quinteros, University of Lyon 1 (funded by Conicyt Chile, co-supervisors: Rodrigo Gutierrez – Catholic University of Chile, Marie-France Sagot)
Camille Sessegolo, University of Lyon 1 (funded by ANR Aster; co-supervisors: Vincent Lacroix, Arnaud Mary)
Michelle Sweering, CWI (co-supervisors: Solon Pissis and Leen Stougie)
Yishu Wang, University Lyon 1 (funded by Ministère de l?Enseignement supérieur, de la Recherche et de l?Innovation, co-supervisors: Mário Figueiredo – Instituto Superior Técnico at Lisbon; Marie-France Sagot; Blerina Sinaimeri)
Irene Ziska, University Lyon 1 (funded by Inria Cordi-S, co-supervisors: Susana Vinga – Instituto Superior Técnico at Lisbon; Marie-France Sagot)
The following are the PhD or HDR juries to which members of ERABLE participated in 2019.
Vincent Lacroix: External reviewer of the PhD of Patricia Sieber, supervised by Stefan Schuster at Friedrich-Schiller University of Jena, Germany; external reviewer of the PhD of Luca Denti, supervised by Paola Bonizzoni at University Bicocca of Milano, Italy.
Arnaud Mary: External reviewer of the PhD of Karima Ennaoui, supervised by Lhouari Nourine at University of Clermont-Ferrand, France.
Marie-France Sagot: External Reviewer of the PhD of Pierre Marijon, University of Lille, France, Dec 2019.
Leen Stougie: Reading Committee of the PhD of Teun Janssen, TU Delft, Mar 2019; Chair Reading Committee of the PhD of Pieter Kleer, Vrije Universiteit Amsterdam, Sep 2019; Reading Committee of the PhD of Peter van der Gulik, Univ. of Amsterdam, Sep 2019; Chair Reading Committee of the PhD of Maaike Hoogeboom, Vrije Universiteit Amsterdam, Dec 2019.
Cristina Vieira: Member of the PhD Committee of Olivier Tabone, Faculté de Médecine Rockfeller, Jan 2019; Member of the PhD Committee of Sébastien Lemaire, ENS Lyon, Mar 2019; External Reviewer of the PhD of Natalia Martinez, Université Paris Sud, Oct 2019.
Carol Moraga Quinteros participated in the contest “DESCRYPThèse” of the doctoral school E2M2 of the University of Lyon 1, winning a prize for one of the best presentations in April 2019. The title of the talk was “BrumiR: un algorithme de novo pour prédire les petits ARNs sans génome de référence”.