Section: Scientific Foundations

Combinatorial models and algorithms

Our research is driven by biological questions. At the same time, we have in mind to develop well-founded models and algorithms. This is essential to guarantee the universality of our results. Our main background comes from combinatorial discrete models and algorithms. Biological macromolecules are naturally modelled by various types of discrete structures: String, trees, and graphs, etc.

String algorithms is an established research subject of the team. We have been working on spaced seed techniques for several years [22] , [23] , [24] , [33] , [35] , [27] , [26] .The whole technique is implemented and made available in the Yass software for DNA sequence alignment together with the tools implemented to design seeds  [28] (see Section  4 ).

Members of the team have also a strong expertise in text indexing data structures that are widely-used for the analysis of biological sequences because they allow a data set to be stored and queried efficiently. We proposed an optimal neighborhood indexing for protein similarity search [34] and compressed index structures for DNA sequences [37] , [36] .

Ordered trees and graphs naturally arise when dealing with structural RNAs. Our knowledge in this field allowed us to make several significant contributions to RNA bioinformatics on the past few years. First, we proposed a new method for RNA structure inference, implemented in a program called caRNAc , Second, we worked on theoretical models for RNA comparison, which led to substantial advances on tree edit distance algorithms  [20] , [38] , [31] , tree models  [30] , [29] and comparison of arc-annotated sequences  [18] , [17] .

String, trees and graphs are also useful to study genomic rearrangements: Neighborhoods of genes can be modelled by oriented graphs, genomes as permutations, strings or trees.

Nonribosomal peptides representation also uses graphs: Nonribosomal peptides are small molecules that have a branching or cyclic structure. We developed several efficient algorithms to compare NRP molecules represented as non-oriented labeled graphs  [19] .