Genomic Exploration of the Hemiascomycetous Yeasts: 4. The genome of Saccharomyces cerevisiaerevisited

MAGNOME Models and Algorithms for the Genome

Computational Sciences for Biology, Medicine and the Environment

Computational Biology and Bioinformatics

MAGNOME is an INRIA Project-Team joint with University of Bordeaux and CNRS (LaBRI, UMR 5800)

David James Sherman INRIA Chercheur

Bordeaux

Team leader; INRIA Senior Research Scientist (DR) oui Marie Sanchez INRIA Assistant

Bordeaux

INRIA Pascal Durrens CNRS Chercheur

Bordeaux

CNRS, Research scientist (CR) oui Macha Nikolski CNRS Chercheur

Bordeaux

CNRS, Research scientist (CR) oui Elisabeth Bon UnivFr Enseignant

Bordeaux

University Bordeaux, Associate Professor (MCF) Tiphaine Martin CNRS Technique

Bordeaux

CNRS, Research engineer Alice Garcia INRIA Technique

Bordeaux

Contract engineer for BioRica ADT Rodrigo Assar-Cuevas INRIA PhD

Bordeaux

CORDI-S INRIA, since Oct. 2008 Natalia Golenetskaya INRIA PhD

Bordeaux

CORDI-S INRIA, since Oct. 2009 Nicolás Loira INRIA PhD

Bordeaux

CONICYT Chile, since Mar. 2007 Anasua Sarkar INRIA PhD

Bordeaux

EMMA co-reg. Jadavpur University, since Oct. 2009 Hayssam Soueidan UnivFr PhD

Bordeaux

MENSR University Bordeaux, since Mar. 2006 Adrien Goëffon INRIA PostDoc

Bordeaux

INRIA, until August 2009 Julie Bourbeillon INRIA PostDoc

Bordeaux

ATER University Bordeaux, until August 2009 Géraldine Jean UnivFr PostDoc

Bordeaux

ATER University Bordeaux Grégoire Sutre CNRS CollaborateurExterieur

Bordeaux

CNRS, Research scientist (CR) Nikolai Vyahhi UnivEtrangere Visiteur

Bordeaux

University of St. Petersburg Razanne Issa UnivFr CollaborateurExterieur

Bordeaux

Syrian exchange teacher at U. Bordeaux Overall Objectives Overall Objectives

One of the key challenges in the study of biological systems is understanding how the static information recorded in the genome is interpreted to become dynamic systems of cooperating and competing biomolecules. MAGNOME addresses this challenge through the development of informatic techniques for multi-scale modeling and large-scale comparative genomics:

logical and object models for knowledge representation

stochastic hierarchical models for behavior of complex systems, formal methods

algorithms for sequence analysis, and

data mining and classification.

We use genome-scale comparisons of eukaryotic organisms to build modular and hierarchical hybrid models of cell behavior that are studied using multi-scale stochastic simulation and formal methods. Our research program builds on our experience in comparative genomics, modeling of protein interaction networks, and formal methods for multi-scale modeling of complex systems.

Highlights of the year

In collaboration with the Génolevures Consortium and Washington University at St. Louis, MAGNOME completed a large-scale study of five complete yeast genomes from the implicated in various biotechnological applications. These species have been baptised “protoploid” because they are the best contemporary genomes representing the ancestral chomosome number for this phylogenetic branch. Nearly all of MAGNOME's core methodologies were brought on line for this study: genome annotation and analysis , , median and ancestral genome reconstruction , , data integration and web deployment .

In collaboration with the Institute of Wine and Vine Science, MAGNOME improved understanding of the relation between genome variation and efficiency of cell factory microorganisms used in wine making.

Macha Nikolski (CR1 CNRS) of MAGNOME defended her HDR .

Scientific Foundations Scientific Foundations

Fundamental questions in the life sciences can now be addressed at an unprecedented scale through the combination of high-throughput experimental techniques and advanced computational methods from the computer sciences. The new field of computational biologyor bioinformaticshas grown around intense collaboration between biologists and computer scientists working towards understanding living organisms as systems. One of the key challenges in this study of systems biology is understanding how the static information recorded in the genome is interpreted to become dynamic systems of cooperating and competing biomolecules.

Magnomeaddresses this challenge through the development of informatic techniques for multi-scale modeling and large-scale comparative genomics: data models for knowledge representation, stochastic hierarchical models for behavior of complex systems, algorithms for genome analysis, and data mining and classification. Our research program builds on our experience in comparative genomics, data-mining and classification, and formal methods for multi-scale stochastic modeling of complex systems.

The first overall goal for Magnomeis to develop methods for understanding the structure and history of eukaryote genomes, in order to identify their differences and the link between these differences and the dynamic behavior of these organisms. The central dogma of evolutionary biology postulates that contemporary genomes evolved from a common ancestral genome, but the large scale study of their evolutionary relationships is frustrated by the unavailability of these ancestral organisms that have long disappeared. However, this common inheritance allows us to discover these relationships through comparison, to identify those traits that are common and those that are novel inventions since the divergence of different lineages.

We develop novel techniques to address fundamental questions of mechanisms of gene dynamics, and the ways that genes and their products are organized at different scales. These results are then combined into integrated models through the organization of these objects into networks and pathways that can be used to predict the dynamic behavior of cells. Through combinatorial optimization we can construct plausible hypotheses about the structure of ancestral genome architectures, which may provide deep insight both into the past histories of particular genomes and the general mechanisms of their formation.

The methods designed by Magnomefor comparative genome annotation, structured genome comparison, and construction of integrated models are applied on a large scale to yeasts from the hemiascomycete class , , , , , which provide a unique tool for studying eukaryotic genome evolution over a broad range of distances. With their relatively small and compact genomes, yeasts offer a unique opportunity to explore eukaryotic genome evolution by comparative analysis of several species. Yeasts are widely used as cell factories, for the production of beer, wine and bread and more recently of various metabolic products such as vitamins, ethanol, citric acid, lipids, etc. Yeasts can assimilate hydrocarbons, depolymerise tannin extracts, and produce hormones and vaccines in industrial quantities through heterologous gene expression. Several yeast species are pathogenic for humans. The hemiascomycetous yeasts represent a homogeneous phylogenetic group of eukaryotes with a relatively large physiological and ecological diversity.

The second overall goal for Magnomeuses theoretical results from formal methods to define a mathematical framework in which discrete and continuous models can communicate with a clear semantics. We exploit this to develop the BioRica platform, a modeling middlewarein which hierarchical modelscan be assembled from existing models. Such models are translated into their execution semantics and then simulated at multiple resolutions through multi-scale stochastic simulation.

A general goal of systems biology is to acquire a detailed quantitative understanding of the dynamics of living systems. Different formalisms and simulation techniques are currently used to construct numerical representations of biological systems, and a certain wealth of models is proposed using specific and ad hocmethods. A recurring challenge is that hand-tuned, accurate models tend to be so focused in scope that it is difficult to repurpose them. Instead of modeling individual processes individually de novo, we claim that a sustainable effort in building efficient behavioral models must proceed incrementally. Hierarchical modeling R. Alur et al.Generating embedded software from hierarchical hybrid models. In Proceedings of LCTES, pp 171–82, 2003.is one way of combining specific models into networks. Effective use of hierarchical models requires both formal definition of the semantics of such composition, and efficient simulation tools for exploring the large space of complex behaviors.

Hierarchical modeling that integrates both genome-scale models of metabolism and fine-grained models of particular processes of interest in a given application is recognized as a major challenge in systems biology both by the European Union (see “Systems biology: a grand challenge for Europe,” ESF Grand Challenges, Sept. 2007). Furthermore the NSF in the United States recognized since 2004 that multi-scale modeling that integrates all scales from molecular through population levels, is the way for modeling to impact the understanding of biological processes (see, for example NSF 04-607).

The MagnomeBioRica system is a high-level modeling framework integrating discrete and continuous multi-scale dynamics within the same semantics domain, while offering a easy to use and computationally efficient numerical simulator. It is based on a generic approach that captures a range of discrete and continuous formalisms and admits a precise operational semantics . On the practical level, BioRica models are compiled into a discrete event formalism capable of capturing discrete, continuous, stochastic, non deterministic and timed behaviors in an integrated and non-ambiguous way.

Our long-term goal to develop a methodology in which we can assemble a modelfor a species of interest using a library of reusable models and a organism-level “schematic” determined by comparative genomics.

MAGNOME's short- and mid-term objectives can be described as follows:

Comparative genome annotation

We develop efficient methodologies and a software platform, for associating biological information with complete genome sequences, in the particular case where several phylogenetically-related eukaryote genomes are studied simultaneously.

Phylogenetic protein families establish relations of conservation and lineage-specific gain and loss that permit the detailed study of adaptation and functional specialization. Algorithmic techniques must be developed to improve precision across million-year phylogenetic ranges. Two challenges must be addressed in the classification methods: better definition of inclusion relations, and incorporation of gene fusion and fission events, which induce reticulate relations between family classifications.

Rather than compare flat sets of genes grouped into functional classes, structured comparisonexplores the topological structure of the graph of relations between genes. Biomolecular networks are one way to perform such comparisons. Using graph theoretic techniques we can assess the relative conservation of networks from one species to another, with the aim of identifying functional differences between the species.

The computational and storage needs of large-scale global comparison of genomes require a dedicated integrated platform for knowledge representation, high-performance computing and software development. A complete analysis chain for new genomes must start from a genome sequence and produce a preliminary annotation, including prediction of genes, putative assignment to protein families, and application of coherency rules. These tools must take into account specificities of fungal genomes such as clade-specific gene architectures, lineage-specific protein families and pathways, and known phylogenetic relationships.

We validate these methodological advances through application to sets of species of biotechnological interest, in collaboration with our biological partners. Magnomemanages a key a comprehensive comparison of eighteen yeast genomes, annotated by the Génolevures consortium . This annotation effort by 40 scientists in France and Belgium has resulted in a complete catalogue of protein-coding genes and other genetic elements, and work by the Magnometeam has classified these elements into phylogenetic, structural, and functional categories. These analyses must be extended to systematically cover the range of relations defined above, and will constitute a fundamental resource for the development of dynamic models.

Genome dynamics and evolutionary mechanisms

We develop algorithms for detecting historical relations between genomes and exploring the concrete events and general mechanisms of molecular evolution, in particular mechanisms of rearrangement and duplication that reshape genomes.

Genome rearrangements on two scales contribute to this systematic comparison. Using a complete analysis of gene fusion events across the yeasts and fungi, we identify small-scale events that lead to the birth of new genes and the acquisition of new or improved functions . On a larger scale, rearrangements of large segments are investigated through a combination of conserved segment identification ( in silicochromosomal painting using chromosomal homology established using conserved protein families) and combinatorial techniques we have developed for median genome and rearrangement scenario computation , , .

The expected results are a comprehensive view of yeast genome organization and evolution, described at multiple scales.

Hierarchical modeling

We develop practical and semantically rigorous formalisms for constructing hybrid hierarchical models of dynamic, stochastic biological processes, with a particular focus of model reuse, and build software tools for simulation and analysis of these models. BioRica is a formalism for hybrid hierarchical modeling developed by Magnomeand instantiated in a software platform.

Formal analysis of biological models is usually faced with two major challenges: on one hand these models exhibit complex behaviors since they may contain both hybrid and stochastic modeling features, which leads to theoretical limitations (undecidability in general). On another hand, precise models tend to be very large, with thousands of discrete or continuous variables, and moreover with multiple time-scales. This leads in practice to the well-known combinatorial explosion problem. We improve the state-of-the-art by adapting strategies that have led to significative successes in modeling human-engineered systems, in particular extending the reach of abstraction-based formal analysis techniques to these models . Both trace-based abstraction and qualitative abstraction of hybrid stochastic systems will be developed.

Validation of this approach is based on applications in dynamic modeling of fermenting and oleaginous yeasts. We will develop a modeling methodology that will first, advance the state of the art of modular modeling in systems biology, and second, enable mixing phenomena described with different precision within the same framework of stochastic hybrid hierarchical models.

Application Domains Comparative Genomics of Yeasts

The best way to understand the structureand the evolutionary historyof a genome is to compare it with others. At the level of single genes this is a standard and indeed essential procedure: one compares a gene sequence with others in data banks to identify sequence similarities that suggest homology relations. For most gene sequences these relations are the only clues about gene function that are available. The procedure is essential because the difference between the number of genes identified by in silicosequence analysis and the number that are experimentally characterized is several orders of magnitude. At the level of whole genomes, large-scale comparison is still in its infancy but has provided a number of remarkable results that have led to better understanding, on a more global level, of the mechanisms of evolution and of adaptation.

Yeasts provide an ideal subject matter for the study of eukaryotic microorganisms. From an experimental standpoint, the yeast Saccharomyces cerevisiaeis a model organism amenable to laboratory use and very widely exploited, resulting in an astonishing array of experimental results.

From a genomic standpoint, yeasts from the hemiascomycete class provide a unique tool for studying eukaryotic genome evolution on a large scale. With their relatively small and compact genomes, yeasts offer a unique opportunity to explore eukaryotic genome evolution by comparative analysis of several species. Yeasts are widely used as cell factories, for the production of beer, wine and bread and more recently of various metabolic products such as vitamins, ethanol, citric acid, lipids, etc. Yeasts can assimilate hydrocarbons (genera Candida, Yarrowiaand Debaryomyces), depolymerise tannin extracts ( Zygosaccharomyces rouxii) and produce hormones and vaccines in industrial quantities through heterologous gene expression. Several yeast species are pathogenic for humans. The most well known yeast in the Hemiascomycete class is S. cerevisiae, widely used as a model organism for molecular genetics and cell biology studies, and as a cell factory. As the most thoroughly-annotated genome of the small eukaryotes, it is a common reference for the annotation of other species. The hemiascomycetous yeasts represent a homogeneous phylogenetic group of eukaryotes with a relatively large diversity at the physiological and ecological levels. Comparative genomic studies within this group have proved very informative , , , , , , .

The Génolevuresprogram is devoted to large-scale comparisons of yeast genomes from various branches of the Hemiascomycete class, with the aim of addressing basic questions of molecular evolution such as the degrees of gene conservation, the identification of species-specific, clade-specific or class-specific genes, the distribution of genes among functional families, the rate of sequence and map divergences and mechanisms of chromosome shuffling.

The differences between genomes can be addressed at two levels: at a molecular level, considering how these differences arise and are maintained; and at a functional level, considering the influence of these molecular differences on cell behavior and more generally on the adaptation of a species to its ecological niche.

Construction of Biological Networks

Comparative genomics provides the means to identify the set of protein-coding genes that comprise the components of a cell, and thus the set of individual functions that can be assured, but a more comprehensive view of cell function must aim to understand the ways that those components work together. In order to predict how genomic differences influence function differences, it is necessary to develop representations of the ways that proteins cooperate.

One such representation are networks of protein-protein interactions. Protein-protein interactions are at the heart of many important biological processes, including signal transduction, metabolic pathways, and immune response. Understanding these interactions is a valuable way to elucidate cellular function, as interactions are the primitive elements of cell behavior. One of the principal goals of proteomics is to completely describe the network of interactions that underly cell physiology.

As networks of interaction data become larger and more complex, it becomes more and more important to develop data mining and statistical analysis techniques. Advanced visualization tools are necessary to aid the researcher in the interpretation of these relevant subsets. As databases grow, the risk of false positives or other erroneous results also grows, and it is necessary to develop statistical and graph-theoretic methods for excluding outliers. Most importantly, it is necessary to build consensus networks, that integrate multiple sources of evidence. Experimental techniques for detecting protein-protein interactions are largely complementary, and it is reasonable to have more confidence in an interaction that is observed using a variety of techniques than one that is only observed using one technique.

The ProViz software tool addresses the need for efficient visualization tools, and provides a platform for developing interactive analyses. But the key challenge for comparative analysis of interaction networks is the reliable extrapolation of predicted networks in the absence of experimental data.

A complementary challenge to the network prediction is the extraction of useful summaries from interaction data. Existing databases of protein-protein interactions mix different types too freely, and build graph representations that are not entirely sensible, as well as being highly-connected and thus difficult to interpret. We have developed a technique called policy-directed graph extractionthat provides a framework for selecting observations and for building appropriate graph representations. A concrete example of graph extraction is subtractive pathway modeling, which uses correlated gene loss to identify loss of biochemical pathways.

Modeling Biological Systems

Realistic, precise simulation of cell behavior requires detailed, precise models and fine-grain interpretation. At the same time, it is necessary that this simulation be computationally tractable. Furthermore, the models must be comprehensible to the biologist, and claims about properties of the model must be expressed at an appropriate level of abstraction. Reaching an effective compromise between these conflicting goals requires that these systems be hierarchically composed, that the overall semantics provide means for combining components expressed in different quantitative or discrete formalisms, and that the simulation admit stochastic behaviorand evaluation at multiple time scales.

In general, numerical modeling of biological systems follows the process shown below.

Starting from experimental data, sort possible molecular processes and retain the most plausible.

Build a schema depicting the overall model and refine it until it is composed of elementary steps.

Translate these steps into mathematical expressions using the laws of physics and chemistry.

Translate these expressions into time-dependent differential equations quantifying the changes in the model.

Analyze the differential system to assess the model.

Elaborate predictions based on a more detailed study of the differential system.

Test some selected predictions in vitroor in vivo.

This approach has proven substantial properties of various biological processes, as for example in the case of cell cycle . However, it remains tedious and implies a number of limitations that we shortly describe in this section.

Many biochemical processes can be modeled using continuous domains by employing various kinetics based on the mass action law. However quite a number of biological processes involve small scale units and their dynamics can not be approximated using a global approach and needs to be considered unit-wise.

Some of the biological systems are now known to have a switch-like behavior and can only be specified in a continuous realm by using zero-order ultra-sensitive parametric functions converging to a sharply sigmoid function, which artificially complexifies the system.

The lack of formalized translations between each step makes the whole modeling process error-prone, since immersing the high-level comprehensible cartoon into a low-level differential formalism is completely dependent on the knowledge of the modeler and his/her mathematical skills. Maybe even worse, it blurs the explanatory power of the schema.

As an illustration of the last point it is well-known that the same high level process of the lysis/lysogeny decision in lambda bacteriophage infecting an E. colicell can be specified using different low-level formalisms, each producing unique results contradicting the others.

The assessment step of the modeling process is usually conducted by slow and painful parameter tinkering, upon which some artificial integrators and rate constants are added to fit the model to the experimental data without any clue as to what meanings these integrators could have biologically speaking.

Two complementary approaches are necessary for model validation. The first is the validation from the computer science point of view, and is mainly based on intrinsic criteria. The second is the external validation, and in our case requires confirmation of model predictions by biological experiments.

In addition to classic measures such as indexes of cluster validity, our use of instrinsic criteria in comparative genomics depends on treatment of the organism as a system. We define coherency rules for predictions that take into account essential genes, requirements for connectivity in biochemical pathways, and, in the case of genome rearrangements, biological rules for genome construction. These rules are defined at appropriate levels in each application.

Experimental validation is made possible by collaboration with partner laboratories in the biological sciences.

Software Magus: Collaborative Genome Annotation David James Sherman correspondant Pascal Durrens Tiphaine Martin

As part of our contribution the Génolevures Consortium, we have developed over the past few years an efficient set of tools for web-based collaborative annotation of eukaryote genomes. The Magusgenome annotation system ( http:// magus. gforge. inria. fr) integrates genome sequences and sequences features, in silicoanalyses, and views of external data resources into a familiar user interface requiring only a Web navigator. Magusimplements the Génolevures annotation workflow and enforces curation standards to guarantee consistency and integrity. As a novel feature the system provides a workflow for simultaneous annotationof related genomes through the use of protein families identified by in silicoanalyses; this has resulted in a three-fold increase in curation speed, compared to one-at-a-time curation of individual genes. This allows us to maintain Génolevures standards of high-quality manual annotation while efficiently using the time of our volunteer curators.

Magusis built on: a standard sequence feature database, the Stein lab generic genome browser , various biomedical ontologies ( http:// obo. sf. net), and a web interface implementing a representational state transfer (REST) architecture .

See also the web page http:// magus. gforge. inria. fr/ .

Faucils: Analyzing Genome Rearrangement Macha Nikolski Adrien Goëffon Géraldine Jean David James Sherman correspondant Tiphaine Martin

The Faucils suite uses evolutionary and combinatory algorithms to facilitate mathematical exploration of eukaryote genome rearrangement. It is composed of a number of cooperating tools: SyDIG, a method for detecting synteny in distantly related genomes; SuperBlocks, a method for computing ancestral superblocks; Faucils, tools for computing median genomes and rearrangement trees using stochastic local search and any colony optimization; and Virage, an tools for interactive visual exploration of divergent rearrangement scenarios.

These tools are developed internally on the INRIA Gforge site and are licensed under CeCILL.

BioRica: Multi-scale Stochastic Modeling David James Sherman Macha Nikolski correspondant Hayssam Soueidan Nicolás Loira Grégoire Sutre

Multi-scale modeling provides one avenue to better integrate continuous and event-based modules into a single scheme. The word multi-scaleitself can be interpreted both at the level of building the model, and at the level of model simulation. At the modeling level, it involves building modularand hierarchicalmodels. An attractive feature of such modeling is that it provides a systematic means to balance the need for greater biological detail against the need for simplicity. At the execution level, it implies the co-existence of phenomena operating at different time scales in an integrated fashion. This is a very lively research topic by itself, and has promising applications to biology, such as for example in .

We are developing BioRica, a high-level modeling framework integrating discrete and continuous multi-scale dynamics within the same semantics field. BioRica has been adopted as an INRIA Technology Development Action (ADT).

The co-existence of continuous and discrete dynamics is assured by a pre-computation of the continuous parts of the model. Once computed, these parts of the model act as components that can be queried for the function value, but also modified, therefore accounting for any trajectory modification induced by discrete parts of the model. To achieve this we extensively rely on methods for solving and simulation of continuous systems by numerical algorithms. As for the discrete part of the model, its role is that of a controller.

As a means to counteract the over-genericity of re-usable modular models and their underlying simulation complexity, BioRicawill provide an abstraction module, whose aim is to preserve only the pertinent information for a given task. The soundness of this approach is ensured by a formal study of the operational semantics of BioRicamodels that adopts the theoretical framework of abstract interpretation .

The current stage of development extends the AltaRica modeling language to Stochastic AltaRica Dataflow semantics, but also provides parsers for widely used SBML data exchange format. The corresponding simulator is easy to use and computationally efficient.

See also the web page http:// www. labri. fr/ .

Génolevures On Line: Comparative Genomics of Yeasts David James Sherman Pascal Durrens Macha Nikolski Tiphaine Martin correspondant

The Génolevures online database ( http:// cbi. labri. fr/ Genolevures/ ) provides tools and data relative to 9 complete and 10 partial genome sequences determined and manually annotated by the Génolevures Consortium, to facilitate comparative genomic studies of hemiascomycetous yeasts. With their relatively small and compact genomes, yeasts offer a unique opportunity for exploring eukaryotic genome evolution. The new version of the Génolevures database provides truly complete (subtelomere to subtelomere) chromosome sequences, 48 000 protein-coding and tRNA genes, and in silicoanalyses for each gene element. A new feature of the database is a novel collection of conserved multi-species protein familiesand their mapping to metabolic pathways, coupled with an advanced search feature. Data are presented with a focus on relations between genes and genomes: conservation of genes and gene families, speciation, chromosomal reorganization and synteny. The Génolevures site includes an area for specific studies by members of its international community.

The focus of the Génolevures database is to describe the relations between genes and genomes. We curate relations of orthology and paralogy between genes, as individuals or as members of protein families, chromosomal map reorganization and gain and loss of genes and functions. We do not provide detailed annotations of individual genes and proteins of S. cerevisiaewhich are already carefully maintained by the MIPS in the CYGD database (http://mips.gsf.de/projects/fungi) in Europe and by the SGD ( http:// www. yeastgenome. org/ ) in North America, as well as in general-purpose databases such as UniProtKB and EMBL .

While extensive chromosomal rearrangements combined with segmental and massive duplications make comparisons of yeast genome sequences difficult , relations of homology between protein-coding genes can be identified despite their great diversity at the molecular level . Families of homologous proteins provide a powerful tool for appreciating conservation, gain and loss of function within yeast genomes. Génolevures provides a unique collection of paralogous and orthologous protein families, identified using a novel consensus clustering algorithm applied to a complementary set of homeomorphic [sharing full-length sequence similarity and similar domain architectures, see ] and nonhomeomorphic systematic Smith-Waterman and Blast sequence alignments. Similar approaches are developed on a wider scale and are complementary to these yeast-specific families.

The Génolevures database uses a straightforward object model mapped to a relational database. Flexibility in the design is guaranteed through the use of ontologies and controlled vocabularies: the Sequence Ontology for DNA sequence features and GLO, our own ontology for comparative genomics (D. Sherman, unpublished data). Browsing of genomic maps and sequence features is provided by the Generic Genome Browser . The Blast service is provided by NCBI Blast 2.2.6 . The Génolevures web site uses a REST architecture internally and extensively uses the BioPerl package for manipulation of sequence data.

See also the web page http:// cbi. labri. fr/ Genolevures/ .

New Results Genome annotation of protoploid Saccharomycetacae David James Sherman correspondant Pascal Durrens Macha Nikolski Tiphaine Martin Adrien Goëffon Géraldine Jean

Using our whole genome annotation pipeline (defined by David Sherman and Tiphaine Martin), we have successfully realized a complete annotation and analysis of four new genomes, provided to the Génolevures Consortium by the Centre National de Séquençage - Génoscope (Évry) and by the Washington University Genome Sequencing Center (St. Louis, USA). This result required a year of work by a network of 20 experts from 6 partner labs, using the Magus web-based system for collaborative genome annotation, and hundreds of hours of computation on our dedicated 76-core computing cluster. The analysis of these results, performed by members of the Consortium, include identification of 17 500 novel genes, genome comparative cartography and breakpoint analysis, assessment of protein family-specific phylogenetic trees and fast-evolving genes, and definition of a molecular clock through characterization of families of homologous and orthologous protein-coding genes. This major result was published in Genome Research.

Modeling through comparative genomics David James Sherman correspondant Rodrigo Assar-Cuevas Nicolás Loira

Using comparative genomics to inform mathematical models of cell function is a central challenge of the MAGNOME research program. Emmanuelle Beyne developed in silicomethods for predicting protein complexes, one form of protein-protein interaction that provide the building blocks of cell machinery. These predictions were compared to experimental results from gel electrophoresis. This work was extended in a large-scale experimental study using quantitative proteomics and expression data, during a long-term visit to Prof. Steve Oliver's lab at Cambridge University. Florian Iragne has refined his methods for subtractive modeling of biochemical pathways, using his algorithmic framework for policy-directed graph extraction to identify cases of pathway loss through search for correlated gene losses. Nicolás Loira has used a large dataset of protein families from the Génolevures complete genomes and sub-partitioned it through clustering methods to obtain reliable indications of enzyme conservation in nine species. The resulting determination of enzyme conservation is mapped to biochemical reaction models and used to infer stoichiometric models that are currently being evaluated through comparison with experimental results produced by Prof. Nicaud's group at AgroParisTech.

Analysis of oenological genomes David James Sherman Pascal Durrens correspondant Elisabeth Bon

Two activities contributed to improved understanding of the relation between genome variation and efficiency of cell factory microorganisms used in wine making. The first, led by Pascal Durrens, is analysis and mapping of the genomes variations involved in quantitative traits. In collaboration with the ISVV, we detect and map single nucleotide polymorphism (SNP) associated with fermentation parameters during wine fermentation by oenological yeasts. The results will be exploited both in yeast strain improvement (selection of the relevant gene variants) and in modelisation of the fermenting cell (indication of the key metabolic steps).

The second is led by Elisabeth Bon. Through her association with Magnome, the team has acquired a new expertise on prokaryotic models, and notably on the non-pathogenic food production bacterium, Oenococcus oeni. This species is part of the natural microflora of wine and related environments, and is the main agent of the malolactic fermentation (MLF), a step of wine making that generally follows alcoholic fermentation (AF) and contributes to wine deacidification, improvement of sensorial properties and microbial stability. The start, duration and achievement of MLF are unpredictable since they depend both on the wine characteristics and on the properties of the O. oeni strains. Elisabeth is in charge of sequencing effort coordination, explorative and comparative genome data analysis, and comparative genomics. In comparative genomics, we investigated gene repertoire and genomic organization conservation through intra- and inter-species genomic comparisons, which clearly show that the O. oenigenome is highly plastic and fast-evolving. Preliminary results reveal that the optimal adaptation to wine of a strain mostly depends on the presence of key adaptative loops and polymorphic genes. They also point up the role of horizontal gene transfer and mobile genetic elements in O. oenigenome plasticity, and give the first clues of the genetic origin of its oenological aptitudes.

Algorithms for genome rearrangements David James Sherman Macha Nikolski correspondant Géraldine Jean

We developed an improved algorithm, SyDIG, for identifying synteny in distant genomes. It is designed for widespread cases where existing methods, such as filtered genome alignments (e.g. GRIMM-Synteny ), or profile-based iterated search (e.g. i-AdHoRe ), do not work. This in turn has led to improvements in our method for identifying super-blocks of syntenic segments , improving on and building a bridge between competing methods defined by Sankoff and by Bourque and Pevzner. Super-blocks represent the semantics of the ancestral architecture, and provide a piecewise approximation to this architecture that provides a reasonable upper bound on the sum of rearrangement distances between contemporary genomes and the theoretical median. Super-blocks have been successfully identified for a range of species in the Hemiascomycetous yeasts .

Using a new formulation in terms of optimization, we devised a new algorithm, FAUCILS, using techniques from optimization by local search and metaheuristics . The algorithm maintains a population of configurations, modified depending on the set of architectures, and evaluated using the rearrangement distance. The result is a robust approach that converges rapidly, and obtains better results that those reported elsewhere. Compared with competing algorithms currently used, this new algorithm takes only a few minutes, compared to several hours; does so on tens of genomes, compared to a maximum of three; and includes biological constraints such as centromere presence and gene super-block conservation, which competing algorithms do not. A follow-up to FAUCILS uses any colony swarming to identify pairwise rearrangement scenarios .

Gene fusion and fission events David James Sherman Pascal Durrens correspondant Macha Nikolski Razanne Issa

One consequence of genome remodelling in evolution is that these large-scale events can modify genes on the periphery of the duplicated or displaced segment, either by fusion with other genes, or by fission of a gene into several parts. These events produce radical changes in gene content, compared to the more progressive modifications produced by nucleotide substitution, and induce non-treelike, reticulate relations between genes. We have developed a novel algorithmic method for large-scale detection of gene fusion and fission events in fungal genomes, that explicitly uses relations between groups of paralogous genes in order to compensate for genome redundancy. By tracking the mathematical relations between groups of similar genes, rather than between individual genes, we can paint a global picture of remodelling across many species simultaneously. Indeed, fusion and fission events are landmarks of random remodelling, independent of mutation rate: they define a metric of “recombination distance.” This distance lets us build a genome evolution history of species and may well be a better measure than mutation distance of the process of adaptation.

Definition of the BioRica platform David James Sherman correspondant Macha Nikolski Grégoire Sutre Alice Garcia

A major development in 2005-9 was the development of BioRica, an extension of the AltaRica modeling language for complex industrial systems. BioRica is a high-level modeling framework integrating discrete and continuous multi-scale dynamics within the same semantics domain, while offering an easy to use and computationally efficient numerical simulator. It is based on a generic formalism that captures a range of discrete and continuous formalisms and admits a precise operational semantics. BioRica models have a corresponding compositional semantics in terms of an extension of Generalized Markov Decision Processes. This semantics allowed us to prove that BioRica models admit an operational semantics in terms of continuous stochastic processes, and that this operational semantics is correctly simulated by the discrete event stepper used during numerical simulation.

The simulation schema for a given BioRica node is given by a hybrid algorithm that deals with continuous time and allows for discrete events that roll backthe time according to these discrete interruptions. Time advances optimally either by the maximal step size defined by an adaptive integration algorithm, or by discrete jumps defined by the minimal delay necessary for firing a discrete event.

BioRica is instantiated in a software platform for modeling and simulation, that has recently been adopted by the INRIA through a Technology Development Action (ADT).

Transient Behavior in Parametrized Dynamic Models Macha Nikolski correspondant Hayssam Soueidan Grégoire Sutre

Dynamic models in System Biology rely on kinetic parameters to represent the range of possible behaviors of when enzymatic information is incomplete. Analysis of these parametrized models aims at identifying either parameter ranges yielding similar qualitative behaviors, or parameter values yielding a given behavior of interest. Qualitative transient behavior can be successfully analyzed by model checking algorithms applied on models admitting a computable path semantics. However, in Systems Biology, state explosion and negative decidability results limit the scope of model checking to a certain subset of models. Moreover, some published and curated Systems Biology models lack explicit semantics, and for these “black box” models, not much can be assumed, except the possibility of generating simulation results. Mining these simulation results to identify parameter regions yielding similar behaviors is hindered by the size of the parameter space to explore, numerical artifacts and the lack of formal definition of what it means for simulation results to be similar.

We introduce Qualitative Transition Systems(QTS) and define their probabilistic semantics . A novel abstraction operation is defined in with the goal of building QTSs from simulation results. We show that when constructing a QTS from an ODE, the QTS construction can be made independent of the numerical integration scheme. Trajectory comparison using QTS can be made more resistant to noise by detecting points of interest (extremums and inflection) through the construction of a piecewise linear approximation (PLA). We have validated our approach on a large set of SBML models from the BioModels database, including:

The cell cycle model of Tyson et al.(1991) based on interactions between Cdc2 and cyclin, where we investigate “similar” oscillatory behaviors with different transient behaviors.

The MAPK cascade model with negative feedback of Kholodenko (2000), in which we can compute the probability of oscillatory behavior in a large parameter subspace.

The model of crosstalk between an extracellular signal regulated kinase ERK and the Wnt pathway of Kim, Rath et al.(2007), successfully detecting the irreversible pathological response in the oncogenic positive feedback loop.

Other Grants and Activities International Activities HUPO Proteomics Standards Initiative David James Sherman correspondant Julie Bourbeillon

We participate actively in the Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO), and international structure for the development and the advancement of technologies for proteomics. The HUPO PSI develops quality and representation standards for proteomic and interactomic data. The principal standards and PSI-MI, for molecular interactions, and PSI-MS, for mass spectometric data. These standards were presented in reference in the journal Nature Biotechnology. Our project ProteomeBinders (see below) has been accepted as a HUPO PSI working group.

Génolevures Consortium David James Sherman Pascal Durrens correspondant Macha Nikolski Tiphaine Martin

Since 2000 our team is a member of the Génolevures Consortium (GDR CNRS), a large-scale comparative genomics project that aims to address fundamental questions of molecular evolution through the sequencing and the comparison of 14 species of hemiascomycetous yeasts. The Consortium is comprised of 16 partners, in France, Belgium, and England (see http:// cbi. labri. fr/ Genolevures/ ). Within the Consortium our team is responsible for bioinformatics, both for the development of resources for exploiting comparative genomic data and for research in new methods of analysis.

In 2004 this collaboration with the 60+ biologists of the Consortium realized the complete genomic annotation and global analysis of four eukaryotic genomes sequenced for us by the National Center for Sequencing (Génoscope, Évry). This annotation consisted in: the ab initioidentification of candidate genes and gene models though analysis of genomic DNA, the determination of genes coding for proteins and pseudo-genes, the association of information about the supposed function of the protein and its relations phylogenetics. For this global analysis in particular we developed a novel method for constructing multi-species protein families and detailled analyses of the gain and loss of genes and functions throughout evolution.

This perennial collaboration continues in two ways. First, a number of new projects are underway, concerning several new genomes currently being sequenced, and new questions about the mechanisms of gene formation. Second, through the development and improvement of the Génolevures On Line database, in whose maintenance our team has a longstanding committment.

Systems Biology Markup Language Macha Nikolski correspondant

Macha Nikolski has recently been implicated in the standards process for version 3 of the SMBL standard, in particular in defining a rigorous mathematical semantics for timed events and hierarchical compositions.

European Activities Yeast Systems Biology Network (FP6) David James Sherman Macha Nikolski correspondant

Our team is actively involved in the Yeast Systems Biology Network (YSBN) Coordinated Action, sponsored by the EU sixth framework programme. The allocated budget is 1.3 million Euros. The CA is coordinated by Prof. Jens Nielsen (Technical University of Denmark) and involves 17 European universities and 2 start-up biotech companies: InNetics AB and Fluxome Sciences A/S.

The activities of this CA aim at facilitating and improving research in yeast systems biology. The EU team creates standardised methods for research, reference databases, develops inter-laboratory benchmarking, and organizes a international conference, a number of PhD courses, and workshops.

The project involves most of the best EU academic centres in this field of science: Biozentrum University of Basel, Bogazici University Istanbul, Budapest University of Technology and Economics and Hungarian Academy of Sciences, CNSR/LaBRI University Bordeaux, ETH Zurich, Gothenburg University, Manchester University, Lund University, Max Plank Institute of Molecular Genetics, Medical University Vienna, Stuttgart University, Technical University of Denmark, Technical University Delft, University of Milano Bicocca, Virje University Amsterdam, VTT Technical Research Centre Finland.

ProteomeBinders (FP6) David James Sherman correspondant Julie Bourbeillon

The ProteomeBinders Coordination Action, sponsored by the EU sixth framework programme, coordinates the establishment of a European resource infrastructure of binding molecules directed against the entire human proteome. The allocated budget is 1.8 million Euros. The CA is coordinated by Prof. Mike Taussig of the Babraham Institute in the UK.

A major objective of the “post-genome” era is to detect, quantify and characterise all relevant human proteins in tissues and fluids in health and disease. This effort requires a comprehensive, characterised and standardised collection of specific ligand binding reagents, including antibodies, the most widely used such reagents, as well as novel protein scaffolds and nucleic acid aptamers. Currently there is no pan-European platform to coordinate systematic development, resource management and quality control for these important reagents.

The ProteomeBinders Coordination Action ( proteomebinders.org) coordinates 26 European partners and two in the USA, several of which operate infrastructures or large scale projects in aspects including cDNA collections, protein production, polyclonal and monoclonal antibodies. They provide a critical mass of leading expertise in binder technology, protein expression, binder applications and bioinformatics. Many have tight links to SMEs in binder technology, as founders or advisors. The CA will organise the resource by integrating the existing infrastructures, reviewing technologies and high throughput production methods, standardising binder-based tools and applications, assembling the necessary bioinformatics and establishing a database schema to set up a central binders repository. A proteome binders resource will have huge benefits for basic and applied research, impacting on healthcare, diagnostics, discovery of targets for drug intervention and therapeutics. It will thus be of great advantage to the research and biotechnology communities.

Within ProteomeBinders, our team is responsible for formalizing an ontology of binder properties and a set of requirements for data representation and exchange, and for developing a database schema based on these specifications that could be used to set up a central repository of all known ligand binders against the human proteome. The adoption of the proposed standards by the scientific community will determine the success of this activity.

IntAct David James Sherman correspondant Julie Bourbeillon

The IntAct project, led by the European Bioinformatics Institute (EBI) within the framework of the European project TEMBLOR (The European Molecular Biology Linked Original Resources), develops a federated European database of protein-protein interactions and their annotations. IntAct partners develop a normalized representation of annotated protein interaction data and the necessary ontologies, a protocol for data exchange between the nodes of the federated database, and a software infrastructure for the installation of these local nodes. In this infrastructure, a large number of software tools have been realized to aid biological user exploit these data reliably and efficiently. Our own tool Proviz is part of this set of tools. Curator annotation, optimization, and quality control tools have also been developed . We also submit experimental data to the repository.

National Activities ANR GENARISE David James Sherman correspondant Pascal Durrens Macha Nikolski Tiphaine Martin

GENARISE is a four-year ANR project that explores the question of how genes arise and die. Coordinated by Prof. Bernard Dujon of the Pasteur Institute, this pluridisciplinary project uses an original combination of complementary experimental and informatic techniques to answer specific questions about the mechanisms of genome dynamics. The Magnometeam contributes much of the informatics expertise in this project and is in particular plays a role as a resource for in silicotechniques.

ANR DIVOENI Elisabeth Bon correspondant

Elisabeth Bon of Magnomeis a partner in DIVOENI, a four-year ANR project concerning intraspecies biodiversity of Oenococcus oeni, a lactic acid bacterium of wine. Coordinated by Prof. Aline Lonvaud of the Université Victor Ségalen Bordeaux 2, the aims of the programme are: 1) to evaluate the genetic diversity of a vast collection of strains, to set up phylogenetic groups, then to investigate relationships between the ecological niches and the essential phenotypical traits. Hypotheses on the evolution in the species and on the genetic stability of strains will be drawn. 2) to propose methods based on molecular markers to make a better use of the diversity of the species. 3) to measure the impact of the repeated use of selected strains on the diversity in the ecosystem and to draw the conclusions for its preservation.

INRA-INRIA Oleaginous Yeasts David James Sherman correspondant Nicolás Loira

We have been woking with the research teams of Cécile Neuvéglise and Jean-Marc Nicaud at the INRIA Grignon, on analysis and modeling of oleaginous yeasts and their genomes. We have performed genome sequence surveys of several related species and are developing a consensus metabolic model for species in the Yarrowiaclade. These activities will continue in the context of the CAER (Alternative Fuels for Aeronautics) project funded by the French DGAC.

Regional Activities Aquitaine Region “Services robustes pour les réseaux dynamiques (SR2D)” David James Sherman correspondant Pascal Durrens Natalia Golenetskaya

In the wider context of the regional project supporting a research pole in informatics, we work with other experts in data-mining and visualization on the application of these techniques to genomic data. In particular we have develop novel methods for constructing summaries of large data sets, that are coupled with graph visualization techniques in the Tulip platform.

Aquitaine Region “Identification de nouveaux QTL chez la levure pour la sélection de levains œnologiques” Pascal Durrens correspondant

This project is a collaboration between the company SARCO, specialized in the selection of industrial yeasts with distinct technological abilities, the FCBA technology institute, and the CNRS. The goal is to use genome analysis to identify chromosomal regions (QTLs) responsible for different physiological capabilities, as a tool for selecting yeasts for wine fermentation through efficient crossing strategies. Pascal Durrens is leading the bioinformatic analysis of the genomic and experimental data.

Dissemination Reviewing

David Sherman was reviewer for the journal Bioinformatics(Oxford University Press).

David Sherman was reviewer for the journal BMC Bioinformatics(BioMed Central).

David Sherman was reviewer for the journal Nucleic Acids Research(Oxford University Press).

David Sherman was a reviewer for the national program GIS IBiSA.

Pascal Durrens was reviewer for the journal PLoS Computational Biology(Public Library of Science)

Pascal Durrens was part of the thesis jury for Nicolas Jauniaux at the University of Strasbourg.

Macha Nikolski was a member of the program committee for WABI'09.

Memberships and Responsabilities

Pascal Durrens is responsible for scientific diffusion, and David Sherman is head of Bioinformatics, for the Génolevures Consortium.

Tiphaine Martin is member of the Local Committee of the INRIA Bordeaux Sud-Ouest.

Tiphaine Martin is member of the GIS-IBiSA GRISBI-Bioinformatics Grid working group.

Tiphaine Martin and David Sherman are members of the Institut de Grilles, and Tiphaine is active in the Biology/Health working group.

David Sherman is member of the Comité Consultatif Régional de Recherche et de Développement Technologique (CCRRDT) de la Région Aquitaine : Commission 3 “Sciences biologiques, médicales et de la santé” ( suppléantof Claude Kirchner)

David Sherman is member of the Scientific Council of the LaBRI UMR 5800/CNRS

Recruiting committees

Macha Nikolski was member of the CR recruiting committee of the INRIA Saclay.

Pascal Durrens was member of the selection committee for the University of Strasbourg.

Elisabeth Bon was member of the selection committee for the University of Strasbourg.

Tiphaine Martin was member of the IR selection committee for the INRA Toulouse.

Visitors

Nikolai Vyahhi of St. Petersburg University, Russia, was invited for three months as a visiting researcher.

Participation in colloquia, seminars, invitations

David Sherman

23/01/2009 Paris Génolevures

03/02/2009–04/02/2009 Grignon Collaboration INRA

26/02/2009–27/02/2009 Paris

22/03/2009–24/03/2009 Alpbach Austria ProteomeBinders

09/04/2009–22/04/2009 Chicago Invitation U. Chicago

23/04/2009–24/04/2009 Paris Génolevures

26/05/2009 Paris Génolevures

27/05/2009–30/05/2009 Strasbourg Collaboration ULP

21/06/2009–23/06/2009 Paris INRIA

14/09/2009–15/09/2009 Paris INRIA

18/09/2009 Paris GENARISE

08/10/2009 Paris INRIA Evaluation Seminar 06/11/2009 Paris Génolevures

27/11/2009 Paris DIKARYOME

17/12/2009-18/12/2009 Paris INRIA

Pascal Durrens

22/02/2009–27/02/2009 Marseille Ecole d'hiver au CIRM (Modelisation mathématique du cancer)

05/03/2009–06/03/2009 Paris Génolevures

23/03/2009–24/03/2009 Paris Lancement du projet Nakaseomycetes au Génoscope

11/05/2009–11/05/2009 Strasbourg Comité de sélection MdC

14/05/2009–15/05/2009 Paris Génolevures

02/06/2009–03/06/2009 Strasbourg Comité de sélection MdC

25/06/2009–26/06/2009 Paris Génolevures

08/10/2009–09/10/2009 Paris Génolevures

17/10/2009–22/10/2009 Sant Feliu Conference "Comparative Genomics of Eukaryotic Microorganisms"

Tiphaine Martin

23/01/2009 Paris Génolevures

24/02/2009 Strasbourg Génolevures

06/03/2009 Paris Génolevures

10/03/2009–11/03/2009 Lyon Grisbi (grille bioinformatique)

27/03/2009 Paris Thèse Célia Payen

06/04/2009–10/04/2009 Nancy École Grid5000

23/04/2009–24/04/2009 Montpellier Jury IR INRA

04/05/2009–06/05/2009 Roscoff Grisbi (grill bioinformatique)

15/05/2009 Paris Génolevures

26/05/2009–27/05/2009 Lyon Kick-off Grisbi

28/05/2009-30/05/2009 Strasbourg “2nd Greman/French/European Meeting on Yeast and Filamentous Fungi”

03/06/2009–05/06/2009 Cambridge Réunion Grisbi-EMI

08/06/2009–12/06/2009 Nantes JOBIM

26/06/2009 Paris Génolevures

29/06/2009–01/07/2009 Toulouse Jury concours IE INRA

18/09/2009 Paris GENARISE

07/10/2009 Paris Génolevures (site public), groupe restreint

08/10/2009 Paris INRIA Evaluation Seminar

17/10/2009–22/10/2009 San Feliu/Spain; EMBO "Comparative Genomics of Eukaryotic Microorganisms"

06/11/2009 Paris Génolevures

04/12/2009 Paris Génolevures

07/12/2009 Paris Evolution et Biodiversité Bactérienne : impact du séquençage de nouvelle génération, conference

Invitation round table “Promouvoir les filières scientifiques auprès des jeunes filles,” Infosup Dordogne carrière, Perigueux

Conference organization for Biograle 2009, Rennes 24/11/2009–25/11/2009

Macha Nikolski

09/04/2009–22/04/2009 Chicago Invitation U. Chicago

09/07/2009–10/07/2009 Grenoble Seminar IBIS

14/09/2009–15/09/2009 Hinxton Collaboration N. Le Novère

08/10/2009 Paris INRIA Evaluation Seminar 06/12/2009–23/12/2009 Moscow Building international relations

Hayssam Soueidan

09/07/2009–10/07/2009 Grenoble Seminar IBIS

30/08/2009–02/09/2009 Bologna Italy Computational Methods in Systems Biology (CMSB'09)

14/09/2009–15/09/2009 Hinxton Collaboration N. Le Novère

20/10/2009–31/10/2009 Moscow Building international relations

06/12/2009–23/12/2009 Moscow Building international relations

Nicolás Loira

15/05/2009 Grignon

7/06/2009–12/06/2009 Nantes JOBIM 2009

27/07/2009–31/07/2009 Grignon YALI genome-scale metabolic model

10/10/2009–11/10/2009 Paris; "Challenges in experimental data integration within genome-scale metabolic models" (Université Pierre et Marie Curie)

17/10/2009–22/10/2009 San Feliu/Spain; EMBO "Comparative Genomics of Eukaryotic Microorganisms"

30/11/2009–1/12/2009 Grignon YALI genome-scale metabolic model

Anasua Sarkar

17/10/2009–22/10/2009 San Feliu/Spain; EMBO "Comparative Genomics of Eukaryotic Microorganisms"

Teaching

Elisabeth Bon is on the faculty of the Université Victor Ségalen Bordeaux 2 and teaches courses in bioinformatics and cellular biology.

All of the doctoral students in Magnomehave teaching duties as teaching assistants, in the Universities Bordeaux 1 and Victor Ségalen Bordeaux 2, or the ENSEIRB. Post-doc Julie Bourbeillon teachines bioinformatics and statistics at the Université Victor Ségalen Bordeaux 2.

Genomic Exploration of the Hemiascomycetous Yeasts: 4. The genome of Saccharomyces cerevisiaerevisited G. Blandin G. Pascal Durrens P. F. Tekaia F. M. Aigle M. M. Bolotin-Fukuhara M. Elisabeth Bon E. S. Casarégola S. J. de Montigny J. C. Gaillardin C. A. Lépingle A. B. Llorente B. A. Malpertuy A. C. Neuvéglise C. O. Ozier-Kalogeropoulos O. A. Perrin A. S. Potier S. J.-L. Souciet J.-L. E. Talla E. C. Toffano-Nioche C. M. Wésolowski-Louvel M. C. Marck C. B. Dujon B. FEBS Letters 487 1 December 2000 31-36 The HUPO PSI's molecular interaction format–a community standard for the representation of protein interaction data H. Hermjakob H. L. Montecchi-Palazzi L. G. Bader G. J. Wojcik J. L. Salwinski L. A. Ceol A. S. Moore S. S. Orchard S. U. Sarkans U. C. von Mering C. B. Roechert B. S. Poux S. E. Jung E. H. Mersch H. P. Kersey P. M. Lappe M. Y. Li Y. R. Zeng R. D. Rana D. Macha Nikolski M. H. Husi H. C. Brun C. K. Shanker K. SG. Grant S. C. Sander C. P. Bork P. W. Zhu W. A. Pandey A. A. Brazma A. B. Jacq B. M. Vidal M. David James Sherman D. J. P. Legrain P. G. Cesareni G. I. Xenarios I. D. Eisenberg D. B. Steipe B. C. Hogue C. R. Apweiler R. Nat. Biotechnol. 22 2 Feb. 2004 177-83 IntAct: an open source molecular interaction database H. Hermjakob H. L. Montecchi-Palazzi L. C. Lewington C. S. Mudali S. S. Kerrien S. S. Orchard S. M. Vingron M. B. Roechert B. P. Roepstorff P. A. Valencia A. H. Margalit H. J. Armstrong J. A. Bairoch A. G. Cesareni G. David James Sherman D. J. R. Apweiler R. Nucleic Acids Res. 32 Jan. 2004 D452-5 Comparative genomics of protoploid Saccharomycetaceae. Jean-Luc Souciet J.-L. Bernard Dujon B. Claude Gaillardin C. Mark Johnston M. Philippe V Baret P. V. Paul Cliften P. David James Sherman D. J. Jean Weissenbach J. Eric Westhof E. Patrick Wincker P. Claire Jubin C. Julie Poulain J. Valérie Barbe V. Béatrice Ségurens B. Francois Artiguenave F. Véronique Anthouard V. Benoit Vacherie B. Marie-Eve Val M.-E. Robert S Fulton R. S. Patrick Minx P. Richard Wilson R. Pascal Durrens P. Géraldine Jean G. Christian Marck C. Tiphaine Martin T. Macha Nikolski M. Thomas Rolland T. Marie-Line Seret M.-L. Serge Casaregola S. Laurence Despons L. Cecile Fairhead C. Gilles Fischer G. Ingrid Lafontaine I. Veronique Leh V. Marc Lemaire M. Jacky De Montigny J. Cecile Neuveglise C. Agnès Thierry A. Isabelle Blanc-Lenfle I. Claudine Bleykasten C. Julie Diffels J. Emilie Fritsch E. Lionel Frangeul L. Adrien Goëffon A. Nicolas Jauniaux N. Rym Kachouri-Lafond R. Célia Payen C. Serge Potier S. Lenka Pribylova L. Christophe Ozanne C. Guy-Franck Richard G.-F. Christine Sacerdot C. Marie-Laure Straub M.-L. Emmanuel Talla E. Genome Research 2009 epub ahead of print http:// hal. inria. fr/ inria-00407511/ en/ US ProteomeBinders: planning a European resource of affinity reagents for analysis of the human proteome M.J. Taussig M. O. Stoevesandt O. C.A.K. Borrebaeck C. A.R. Bradbury A. D. Cahill D. C. Cambillau C. A. de Daruvar A. S. Duebel S. J. Eichler J. R. Frank R. T.J. Gibson T. D. Gloriam D. L. Gold L. F.W. Herberg F. H. Hermjakob H. J.D. Hoheisel J. T.O. Joos T. O. Kallioniemi O. M. Koegll M. Z. Konthur Z. B. Korn B. E. Kremmer E. S. Krobitsch S. U. Landegren U. S. van der Maarel S. J. McCafferty J. S. Muyldermans S. P-A. Nygren P.-A. S. Palcy S. A. Plueckthun A. B. Polic B. M. Przybylski M. P. Saviranta P. A. Sawyer A. David James Sherman D. J. A. Skerra A. M. Templin M. M. Ueffing M. M. Uhlen M. Nature Methods 4 1 2007 13–17 How to decide which are the most pertinent overly-represented features during gene set enrichment analysis Roland Barriot R. David James Sherman D. J. Isabelle Dutour I. BMC Bioinformatics 8 2007 http:// hal. inria. fr/ inria-00202721/ en/ Genome evolution in yeasts Bernard Dujon B. David James Sherman D. J. Gilles Fischer G. Pascal Durrens P. Serge Casaregola S. Ingrid Lafontaine I. Jacky De Montigny J. Christian Marck C. Cécile Neuvéglise C. Emmanuel Talla E. Nicolas Goffard N. Lionel Frangeul L. Michel Aigle M. Véronique Anthouard V. Anna Babour A. Valérie Barbe V. Stéphanie Barnay S. Sylvie Blanchin S. Jean-Marie Beckerich J.-M. Emmanuelle Beyne E. Claudine Bleykasten C. Anita Boisramé A. Jeanne Boyer J. Laurence Cattolico L. Fabrice Confanioleri F. Antoine De Daruvar A. Laurence Despons L. Emmanuelle Fabre E. Cécile Fairhead C. Hélène Ferry-Dumazet H. Alexis Groppi A. Florence Hantraye F. Christophe Hennequin C. Nicolas Jauniaux N. Philippe Joyet P. Rym Kachouri-Lafond R. Alix Kerrest A. Romain Koszul R. Marc Lemaire M. Isabelle Lesur I. Laurence Ma L. Héloïse Muller H. Jean-Marc Nicaud J.-M. Macha Nikolski M. Sophie Oztas S. Odile Ozier-Kalogeropoulos O. Stefan Pellenz S. Serge Potier S. Guy-Franck Richard G.-F. Marie-Laure Straub M.-L. Audrey Suleau A. Dominique Swennen D. Fredj Tekaia F. Micheline Wésolowski-Louvel M. Eric Westhof E. Bénédicte Wirth B. Maria Zeniou-Meyer M. Ivan Zivanovic I. Monique Bolotin-Fukuhara M. Agnès Thierry A. Christiane Bouchier C. Bernard Caudron B. Claude Scarpelli C. Claude Gaillardin C. Jean Weissenbach J. Patrick Wincker P. Jean-Luc Souciet J.-L. Nature 430 6995 07 2004 35-44 http:// hal. archives-ouvertes. fr/ hal-00104411/ en/ Fusion and fission of genes define a metric between fungal genomes. Pascal Durrens P. Macha Nikolski M. David James Sherman D. J. PLoS Computational Biology 4 10 2008 e1000200 http:// hal. inria. fr/ inria-00341569/ en/ Family relationships: should consensus reign?- consensus clustering for protein families Macha Nikolski M. David James Sherman D. J. Bioinformatics 23 2007 e71–e76 http:// hal. inria. fr/ inria-00202434/ en/ Genolevures: protein families and synteny among complete hemiascomycetous yeast proteomes and genomes. David James Sherman D. J. Tiphaine Martin T. Macha Nikolski M. Cyril Cayla C. Jean-Luc Souciet J.-L. Pascal Durrens P. Nucleic Acids Research (NAR) 2009 D550-D554 http:// hal. inria. fr/ inria-00341578/ en/ From Genomic to Functional Models Macha Nikolski M. Université Bordeaux 1 Sciences et Technologies 10 2009 http:// www. labri. fr/ perso/ macha Habilitation à diriger des recherches Ph. D. Thesis Discrete event modeling and analysis for Systems Biology models Hayssam Soueidan H. Université Bordeaux 1 Sciences et Technologies 12 2009 http:// www. labri. fr/ perso/ soueidan Ph. D. Thesis Oenococcus oeni genome plasticity is associated with fitness Elisabeth Bon E. Arnaud Delaherche A. Eric Bilhere E. Antoine De Daruvar A. Aline Lonvaud-Funel A. Claire Le Marrec C. 0099-2240 Applied and Environmental Microbiology 75 7 2009 2079-90 http:// hal. inria. fr/ inria-00392015/ en/ Mining the semantics of genome super-blocks to infer ancestral architectures Géraldine Jean G. David James Sherman D. J. Macha Nikolski M. 1066-5277 Journal of Computational Biology 2009 http:// hal. inria. fr/ inria-00414692/ en/ Unusual composition of a yeast chromosome arm is associated with its delayed replication. Célia Payen C. Gilles Fischer G. Christian Marck C. Caroline Proux C. David James Sherman D. J. Jean-Yves Coppée J.-Y. Mark Johnston M. Bernard Dujon B. Cécile Neuvéglise C. 1088-9051 Genome Research 2009 epub ahead of print http:// hal. inria. fr/ inria-00407518/ en/ US Minimum information requirements : neither bandits in the Attic nor bats in the belfry David James Sherman D. J. 1871-6784 New Biotechnology 25 4 2009 173-4 http:// hal. inria. fr/ inria-00407505/ en/ Comparative genomics of protoploid Saccharomycetaceae. Jean-Luc Souciet J.-L. Bernard Dujon B. Claude Gaillardin C. Mark Johnston M. Philippe V Baret P. V. Paul Cliften P. David James Sherman D. J. Jean Weissenbach J. Eric Westhof E. Patrick Wincker P. Claire Jubin C. Julie Poulain J. Valérie Barbe V. Béatrice Ségurens B. Francois Artiguenave F. Véronique Anthouard V. Benoit Vacherie B. Marie-Eve Val M.-E. Robert S Fulton R. S. Patrick Minx P. Richard Wilson R. Pascal Durrens P. Géraldine Jean G. Christian Marck C. Tiphaine Martin T. Macha Nikolski M. Thomas Rolland T. Marie-Line Seret M.-L. Serge Casaregola S. Laurence Despons L. Cecile Fairhead C. Gilles Fischer G. Ingrid Lafontaine I. Veronique Leh V. Marc Lemaire M. Jacky De Montigny J. Cecile Neuveglise C. Agnès Thierry A. Isabelle Blanc-Lenfle I. Claudine Bleykasten C. Julie Diffels J. Emilie Fritsch E. Lionel Frangeul L. Adrien Goëffon A. Nicolas Jauniaux N. Rym Kachouri-Lafond R. Célia Payen C. Serge Potier S. Lenka Pribylova L. Christophe Ozanne C. Guy-Franck Richard G.-F. Christine Sacerdot C. Marie-Laure Straub M.-L. Emmanuel Talla E. 1088-9051 Genome Research 2009 epub ahead of print http:// hal. inria. fr/ inria-00407511/ en/ US Genolevures: protein families and synteny among complete hemiascomycetous yeast proteomes and genomes. David James Sherman D. J. Tiphaine Martin T. Macha Nikolski M. Cyril Cayla C. Jean-Luc Souciet J.-L. Pascal Durrens P. 0305-1048 Nucleic Acids Research (NAR) 2009 D550-D554 http:// hal. inria. fr/ inria-00341578/ en/ Hierarchical study of Guyton Circulatory Model Rodrigo Assar R. Hayssam Soueidan H. David James Sherman D. J. Irena Rusu Eric Rivals I. R. Les Journées Ouvertes en Biologie, Informatique et Mathématiques JOBIM 2009, France Nantes Eric Rivals, Irena Rusu 2009 http:// hal. inria. fr/ inria-00404135/ en/ Journées Ouvertes Biologie Informatique Mathématiques 10 JOBIM Oenococcus oeni genome plasticity associated with adaptation to wine, an extreme ecological niche Elisabeth Bon E. Arnaud Delaherche A. Eric Bilhere E. Cécile Miot-Sertier C. Pascal Durrens P. Antoine De Daruvar A. Aline Lonvaud-Funel A. Claire Le Marrec C. Eric Rivals E. Irena Rusu I. JOBIM-10èmes Journées Ouvertes en Biologie, Informatique et Mathématiques, France Nantes 2009 121-122 http:// hal. inria. fr/ inria-00392020/ en/ Journées Ouvertes Biologie Informatique Mathématiques 10 JOBIM A 24 kb-genomic island contributing to Oenococcus oeni adaptation to wine can excise from the chromosome Elisabeth Bon E. Cécile Miot-Sertier C. Marguerite Dols-Lafargue M. Guillaume Morel G. Aline Lonvaud-Funel A. Claire Le Marrec C. 16th CBL-Club des Bactéries Lactiques Meeting, France Toulouse 2009 http:// hal. inria. fr/ inria-00392022/ en/ Meeting du Club des Bactéries Lactiques 16 CBL IS30 elements as mediators of strain diversity in Oenococcus oeni Fatima El Garniti F. Cécile Miot-Sertier C. Marguerite Dols-Lafargue M. Elisabeth Bon E. Aline Lonvaud-Funel A. Claire Le Marrec C. 16th CBL- Club des Bactéries Lactiques Meeting, France Toulouse 2009 http:// hal. inria. fr/ inria-00396032/ en/ Meeting du Club des Bactéries Lactiques 16 CBL Oenococcus oeni genomic adaptation to the wine environment Patrick Lucas P. Elisabeth Bon E. Vincent Renouf V. Marguerite Dols-Lafargue M. Claire Le Marrec C. Aline Lonvaud-Funel A. SGM-Society for General Microbiology Autumn 2009 Meeting, Royaume-Uni Edinburg 2009 http:// hal. inria. fr/ inria-00395530/ en/ Meeting of the Society for General Microbiology 2009 SGM The Génolevures online database Tiphaine Martin T. Macha Nikolski M. David James Sherman D. J. Jean-Luc Souciet J.-L. Pascal Durrens P. Second German/ French/European/ Meeting Yeast and Filamentous Fungi, France Strasbourg 2009 http:// hal. inria. fr/ inria-00409534/ en/ German French European Meeting on Yeast and Filamentous Fungi 2 Base de données Génolevures : génomique comparative des Hemiascomycetes Tiphaine Martin T. David James Sherman D. J. Macha Nikolski M. Jean-Luc Souciet J.-L. Pascal Durrens P. Eric Rivals E. Irena Rusu I. Journée Ouvertes Biologie Informatique Mathématiques, JOBIM 2009, France Nantes 2009 181-182 http:// hal. inria. fr/ inria-00401915/ en/ Journées Ouvertes Biologie Informatique Mathématiques 10 JOBIM Qualitative Transition Systems for the Abstraction and Comparison of Transient Behavior in Parametrized Dynamic Models Hayssam Soueidan H. Grégoire Sutre G. Macha Nikolski M. Computational Methods in Systems Biology (CMSB'09), Italie Bologna 5688 Springer Verlag 2009 313–327 http:// hal. archives-ouvertes. fr/ hal-00408909/ en/ International Conference on Computational Methods in Systems Biology 7 CMSB Swarming Along the Evolutionary Branches Sheds Light on Genome Rearrangement Scenarios Nikolay Vyahhi N. Adrien Goëffon A. David James Sherman D. J. Macha Nikolski M. Franz Rothlauf F. ACM SIGEVO Conference on Genetic and evolutionary computation, Canada Montréal ACM ACM SIGEVO

2009 http:// hal. inria. fr/ inria-00407508/ en/ Genetic and Evolutionary Computation Conference 2009 GECCO RU Extending Discrete Event Systems for the Hierarhical Specification, Analysis and Simulation of Systems Biology Models Hayssam Soueidan H. Séminaire équipe INRIA IBIS, France 2009 http:// hal. archives-ouvertes. fr/ hal-00407522/ en/ Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Stephen F. Altschul S. F. Thomas L. Madden T. L. Alejandro A. Schäffer A. A. Jinghui Zhang J. Zheng Zhang Z. Webb Miller W. David J. Lipman D. J. Nucleic Acids Res. 25 1997 3389–3402 The Universal Protein Resource (UniProt) A. Bairoch A. R. Apweiler R. C.H. Wu C. W.C. Barker W. B. Boeckmann B. et al. e. Nucleic Acids Res. 33 2005 D154–D159 Genomic Exploration of the Hemiascomycetous Yeasts: 4. The genome of Saccharomyces cerevisiaerevisited G. Blandin G. Pascal Durrens P. F. Tekaia F. M. Aigle M. M. Bolotin-Fukuhara M. Elisabeth Bon E. S. Casarégola S. J. de Montigny J. C. Gaillardin C. A. Lépingle A. B. Llorente B. A. Malpertuy A. C. Neuvéglise C. O. Ozier-Kalogeropoulos O. A. Perrin A. S. Potier S. J.-L. Souciet J.-L. E. Talla E. C. Toffano-Nioche C. M. Wésolowski-Louvel M. C. Marck C. B. Dujon B. FEBS Letters 487 1 December 2000 31-36 SGD: Saccharomyces Genome Database J.M. Cherry J. C. Adler C. C Ball C. S.A. Chervitz S. S.S. Dwight S. E.T. Hester E. Y. Jia Y. G. Juvik G. T. Roe T. M. Schroeder M. S. Weng S. D. Botstein D. Nucleic Acids Res. 26 1998 73–79 Finding functional features in Saccharomycesgenomes by phylogenetic footprinting P. Cliften P. P. Sudarsanam P. A. Desikan A. L. Fulton L. B. Fulton B. J. Majors J. R. Waterston R. B. A. Cohen B. A. M. Johnston M. Science 301 2003 71–76 Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints P. Cousot P. R. Cousot R. Conference Record of the Fourth ACM Symposium on Principles of Programming Languages January 1977 238–252 The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome F. S. Dietrich F. S. et al. e. Science 304 2004 304-7 Genome Evolution in Yeasts Bernard Dujon B. David James Sherman D. J. et al. e. Nature 430 2004 35–44 The Sequence Ontology: a tool for the unification of genome annotations K. Eilbeck K. S.E. Lewis S. C.J. Mungall C. M. Yandell M. L. Stein L. R. Durbin R. M. Ashburner M. Genome Biology 6 2005 R44 Principled design of the modern Web architecture R. Fielding R. R.N. Taylor R. ACM Trans. Internet Technol. 2 2002 115–150 The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models M. Hucka M. et al. e. Bioinformatics 19 4 2003 524-31 The EMBL Nucleotide Sequence Database C. Kanz C. P. Aldebert P. N. Althorpe N. W. Baker W. A. Baldwin A. K. Bates K. et al. e. Nucleic Acids Res. 33 database issue 2005 D29–D33 Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae M. Kellis M. BW Birren B. ES Lander E. Nature 428 2004 617-24 Sequencing and comparison of yeast species to identify genes and regulatory elements M. Kellis M. N. Patterson N. M. Endrizzi M. B. Birren B. E. S. Lander E. S. Nature 423 2003 241–254 From molecular noise to behavioural variability in a single bacterium. E. Korobkova E. T. Emonet T. J. Vilar J. T.S. Shimizu T. P. Cluzel P. Nature 428 2004 574–578 Eucaryotic genome evolution through the spontaneous duplication of large chromosomal segments R. Koszul R. S. Caburet S. B. Dujon B. G. Fischer G. EMBO Journal 23 1 2004 234-43 MIPS: a database for genomes and protein sequences HW. Mewes H. D. Frischman D. U. Guldener U. G. Mannhaupt G. K. Mayer K. M. Mokrejs M. B. Morgenstern B. M. Munsterkotter M. S. Rudd S. B. Weil B. Nucleic Acids Res. 30 1 January 2002 31–34 Identification of common molecular subsequences T. F. Smith T. F. M.S. Waterman M. Journal of Molecular Biology 147 1981 195–197 FEBS Letters Special Issue: Génolevures J.-L. Souciet J.-L. et al. e. FEBS Letters 487 1 December 2000 Syntaxe, Sémantique et abstractions de programmes AltaRica Dataflow Hayssam Soueidan H. Macha Nikolski M. Grégoire Sutre G. Université de bordeaux 1 2005 http:// www. labri. fr/ ~soueidan/ Masters thesis The BioPerl Toolkit: Perl modules for the life sciences J.E. Stajich J. D. Block D. K. Boulez K. S.E. Brenner S. S.A. Chervitz S. et al. e. Genome Res. 12 2002 1611-18 The Generic Genome Browser: A building block for a model organism system database L. D. Stein L. D. Genome Res. 12 2002 1599-1610 Integrative Analysis of Cell Cycle Control in Budding Yeast J.J. Tyson J. Katherine C. Chen K. C. Laurence Calzone L. Attila Csikasz-Nagy A. Frederick R. Cross F. R. Bela Novak B. Mol. Biol. Cell 15 8 2004 3841-3862 http:// www. molbiolcell. org/ cgi/ content/ abstract/ 15/ 8/ 3841 PIRSF: family classification system at the Protein Information Resource C.H. Wu C. A. Nikolskaya A. H. Huang H. L.S. Yeh L. D.A. Natale D. C.R. Vinayaka C. Z.Z. Hu Z. R. Mazumder R. S. Kumar S. P. Kourtesis P. et al. e. Nucleic Acids Res. 32 2004 D315–D318 Fusion and fission of genes define a metric between fungal genomes. Pascal Durrens P. Macha Nikolski M. David James Sherman D. J. PLoS Computational Biology 4 10 2008 e1000200 http:// hal. inria. fr/ inria-00341569/ en/ An Efficient Probabilistic Population-Based Descent for the Median Genome Problem Adrien Goëffon A. Macha Nikolski M. David James Sherman D. J. Proceedings of the 10th annual ACM SIGEVO conference on Genetic and evolutionary computation (GECCO 2008), Atlanta United States ACM 2008 315-322 http:// hal. archives-ouvertes. fr/ hal-00341672/ en/ ProViz: protein interaction visualization and exploration Florian Iragne F. Macha Nikolski M. Bertrand Mathieu B. David Auber D. David James Sherman D. J. Bioinformatics 21 2005 272-274 http:// hal. inria. fr/ inria-00202436/ en/ Family relationships: should consensus reign?- consensus clustering for protein families Macha Nikolski M. David James Sherman D. J. Bioinformatics 23 2007 e71–e76 http:// hal. inria. fr/ inria-00202434/ en/ Génolevures: comparative genomics and molecular evolution of hemiascomycetous yeasts. David James Sherman D. J. Pascal Durrens P. Emmanuelle Beyne E. Macha Nikolski M. Jean-Luc Souciet J.-L. Nucleic Acids Research (NAR) 32 2004 D315-8 http:// hal. inria. fr/ inria-00407519/ en/ GDR CNRS 2354 "Génolevures" Genolevures complete genomes provide data and tools for comparative genomics of hemiascomycetous yeasts. David James Sherman D. J. Pascal Durrens P. Florian Iragne F. Emmanuelle Beyne E. Macha Nikolski M. Jean-Luc Souciet J.-L. Nucleic Acids Res 34 Database issue 01 2006 D432-5 http:// hal. archives-ouvertes. fr/ hal-00118142/ en/ Genolevures: protein families and synteny among complete hemiascomycetous yeast proteomes and genomes. David James Sherman D. J. Tiphaine Martin T. Macha Nikolski M. Cyril Cayla C. Jean-Luc Souciet J.-L. Pascal Durrens P. Nucleic Acids Research (NAR) 11 2008 http:// hal. inria. fr/ inria-00341578/ en/ epub ahead of print BioRica: A multi model description and simulation system Hayssam Soueidan H. David James Sherman D. J. Macha Nikolski M. F0SBE, Allemagne 2007 279-287 http:// hal. archives-ouvertes. fr/ hal-00306550/ en/