Genomic Exploration of the Hemiascomycetous Yeasts: 4. The genome of Saccharomyces cerevisiaerevisited

MAGNOME Models and Algorithms for the Genome

Computational Biology and Bioinformatics

Computational Sciences for Biology, Medicine and the Environment

Laboratoire Bordelais de Recherche en Informatique (LaBRI) CNRS Université de Bordeaux Computational Biology Genomics Genome Dynamics Next Generation Sequencing Models

Magnomeis a joint projet-team with the PRES Bordeaux (Universities Bordeaux 1 and Bordeaux Ségalen) and the CNRS (LaBRI UMR 5800). All of the members of Magnomeare also members of the LaBRI.

David James Sherman INRIA Chercheur

Bordeaux

Team leader; Inria, Senior Researcher (DR) oui Anne-Laure Gautier INRIA Assistant

Bordeaux

Inria Pascal Durrens CNRS Chercheur

Bordeaux

CNRS, Junior Researcher (CR) oui Tiphaine Martin CNRS Technique

Bordeaux

CNRS, Research engineer (IR) Elisabeth Bon UnivFr Enseignant

Bordeaux

U. Bordeaux Ségalen, Associate Professor (MCF) Aurélie Goulielmakis UnivFr Technique

Bordeaux

U. Bordeaux 1, Contract engineer for ANR DIVOENI Alice Garcia INRIA Technique

Bordeaux

Inria, Contract engineer for BioRica ADT, until 2011-04-30 Florian Lajus INRIA Technique

Bordeaux

Inria, Contract engineer for Magus ADT, since 2011-11-02 Rodrigo Assar INRIA PhD

Bordeaux

Inria CORDI-S, until 2011-09-31 Laetitia Bourgeade UnivFr PhD

Bordeaux

Master starting 2011-03-01; MENRT PhD since 2011-10-01 Natalia Golenetskaya INRIA PhD

Bordeaux

Inria CORDI-S Razanne Issa UnivFr PhD

Bordeaux

Exchange Fellowship Syria Nicolás Loira INRIA PhD

Bordeaux

CONICYT Chile until 2011-03-31; Inria, Contract engineer until 2011-07-31 Anasua Sarkar INRIA PhD

Bordeaux

EMMA PhD co-reg. Jadavpur University, until 2011-02-10 Anna Zhukova INRIA PhD

Bordeaux

Inria CORDI-S, since 2011-10-15 Marie Llubères UnivEtrangere Visiteur

Bordeaux

NSF PIRE from 2011-06-01 to 2011-08-03 Vsevolod Makeev AutreAffiliation CollaborateurExterieur

Bordeaux

Russian Acad. Sci., since 2011-07-21 oui Overall Objectives Overall Objectives

One of the key challenges in the study of biological systems is understanding how the static information recorded in the genome is interpreted to become dynamic systems of cooperating and competing biomolecules. MAGNOME addresses this challenge through the development of informatic techniques for multi-scale modeling and large-scale comparative genomics:

logical and object models for knowledge representation

stochastic hierarchical models for behavior of complex systems, formal methods

algorithms for sequence analysis, and

data mining and classification.

We use genome-scale comparisons of eukaryotic organisms to build modular and hierarchical hybrid models of cell behavior that are studied using multi-scale stochastic simulation and formal methods. Our research program builds on our experience in comparative genomics, modeling of protein interaction networks, and formal methods for multi-scale modeling of complex systems.

New high-throughput technologies for DNA sequencing have radically reduced the cost of acquiring genome and transcriptome data, and introduced new strategies for whole genome sequencing. The result has been an increase in data volumes of several orders of magnitude, as well has a greatly increased density of genome sequences within phylogenetically constrained groups of species. Magnomedevelops efficient techniques for dealing with these increased data volumes, and the combinatorial challenges of dense multi-genome comparison.

Highlights

With clinical and academic partners Magnomeparticipated in the development of a new rapid diagnostic test for yeast pathogens in the Nakaseomycetesclass, based on a comparative annotation of six genomes .

These de novo6 genomes – 5 genomes in the class Nakaseomycetes, 1 strain of genome of S.cerevisiae– were automatically annotate from their raw sequences using our YAGA software.

Through a long-standing collaboration between the LaBRI and Prof. Aline Lonvaud at the Institute of Vine and Wine Sciences of Bordeaux, and under the auspices of the ANR DIVOENI contract (2008-2012), we successfully completed the first comparative exploration of Oenococcus oeni pan-transcriptom code. The guidelines delivered partially lift the veil on how the genome of this lactic acid bacterium involved in wine fermentation globally adapts to its environment at a functional and an organisational level .

We released the first whole-genome metabolic model of the oleaginous yeast Yarrowia lipolytica, developed using our Pathtastic software and curated in collaboration with colleages from the INRA Grignon (model in MODEL1111190000 in Biomodels.net).

The complete implementation of the BioRica modeling framework was deposited with the APP and has been released [ biorica]. BioRica was developed as an Inria Technology Development Action.

Scientific Foundations Overview

Fundamental questions in the life sciences can now be addressed at an unprecedented scale through the combination of high-throughput experimental techniques and advanced computational methods from the computer sciences. The new field of computational biologyor bioinformaticshas grown around intense collaboration between biologists and computer scientists working towards understanding living organisms as systems. One of the key challenges in this study of systems biology is understanding how the static information recorded in the genome is interpreted to become dynamic systems of cooperating and competing biomolecules.

Magnomeaddresses this challenge through the development of informatic techniques for understanding the structure and history of eukaryote genomes: algorithms for genome analysis, data models for knowledge representation, stochastic hierarchical models for behavior of complex systems, and data mining and classification. Our work is in methods and algorithms for:

Genome annotationfor complete genomes, performing syntacticanalyses to identify genes, and semanticanalyses to map biological meaning to groups of genes , , , , .

Integration of heterogenous data, to build complete knowledge bases for storing and mining information from various sources, and for unambiguously exchanging this information between knowledge bases , , , , .

Ancestor reconstructionusing optimization techniques, to provide plausible scenarios of the history of genome evolution , , , .

Classification and logical inference, to reliably identify similarities between groups of genetic elements, and infer rules through deduction and induction , , .

Hierarchical and comparative modeling, to build mathematical models of the behavior of complex biological systems, in particular through combination, reutilization, and specialization of existing continuous and discrete models , , , , .

The hundred- to thousand-fold decrease in sequencing costs seen in the past few years presents significant challenges for data management and large-scale data mining. Magnome's methods specifically address “scaling out,” where resources are added by installing additional computation nodes, rather than by adding more resources to existing hardware. Scaling out adds capacity and redundancy to the resource, and thus fault tolerance, by enforcing data redundancy between nodes, and by reassigning computations to existing nodes as needed.

Comparative genomics

The central dogma of evolutionary biology postulates that contemporary genomes evolved from a common ancestral genome, but the large scale study of their evolutionary relationships is frustrated by the unavailability of these ancestral organisms that have long disappeared. However, this common inheritance allows us to discover these relationships through comparison, to identify those traits that are common and those that are novel inventions since the divergence of different lineages.

We develop efficient methodologies and software for associating biological information with complete genome sequences, in the particular case where several phylogenetically-related eukaryote genomes are studied simultaneously.

The methods designed by Magnomefor comparative genome annotation, structured genome comparison, and construction of integrated models are applied on a large scale to:

eukaryotes from the hemiascomycete class of yeasts , , , , , and to

prokaryotes from the lactic bacteria used in winemaking , , .

Comparative modeling

A general goal of systems biology is to acquire a detailed quantitative understanding of the dynamics of living systems. Different formalisms and simulation techniques are currently used to construct numerical representations of biological systems, and a recurring challenge is that hand-tuned, accurate models tend to be so focused in scope that it is difficult to repurpose them. We claim that, instead of modeling individual processes de novo, a sustainable effort in building efficient behavioral models must proceed incrementally. Hierarchical modelingis one way of combining specific models into networks. Effective use of hierarchical models requires both formal definition of the semantics of such composition, and efficient simulation tools for exploring the large space of complex behaviors. We have combined uses theoretical results from formal methods and practical considerations from modeling applications to define BioRica , , a framework in which discrete and continuous models can communicate with a clear semantics. Hierarchical models in BioRica can be assembled from existing models, and translated into their execution semantics and then simulated at multiple resolutions through multi-scale stochastic simulation. BioRica models are compiled into a discrete event formalism capable of capturing discrete, continuous, stochastic, non deterministic and timed behaviors in an integrated and non-ambiguous way. Our long-term goal to develop a methodology in which we can assemble a modelfor a species of interest using a library of reusable models and a organism-level “schematic” determined by comparative genomics.

Comparative modeling is also a matter of reconciling experimental data with models and inferring new models through a combination of comparative genomics and successive refinement , .

Application Domains Function and history of yeast genomes

Yeasts provide an ideal subject matter for the study of eukaryotic microorganisms. From an experimental standpoint, the yeast Saccharomyces cerevisiaeis a model organism amenable to laboratory use and very widely exploited, resulting in an astonishing array of experimental results. From a genomic standpoint, yeasts from the hemiascomycete class provide a unique tool for studying eukaryotic genome evolution on a large scale. With their relatively small and compact genomes, yeasts offer a unique opportunity to explore eukaryotic genome evolution by comparative analysis of several species.

Yeasts are widely used as cell factories, for the production of beer, wine and bread and more recently of various metabolic products such as vitamins, ethanol, citric acid, lipids, etc.

Yeasts can assimilate hydrocarbons (genera Candida, Yarrowiaand Debaryomyces), depolymerise tannin extracts ( Zygosaccharomyces rouxii) and produce hormones and vaccines in industrial quantities through heterologous gene expression.

Several yeast species are pathogenic for humans, especially Candida albicans, Candida glabrata, Candida tropicalisand the Basidiomycete Cryptococcus neoformans.

The hemiascomycetous yeasts represent a homogeneous phylogenetic group of eukaryotes with a relatively large diversity at the physiological and ecological levels. Comparative genomic studies within this group have proved very informative , , , , , , .

Magnomeapplies its methods for comparative genomics and knowledge engineering to the yeasts through the ten-year old Génolevuresprogram (GDR 2354 CNRS), devoted to large-scale comparisons of yeast genomes with the aim of addressing basic questions of molecular evolution. We developed the software tools used by the CNRS's genolevures.orgweb site. Magnome's Magussystem for simultaneous genome annotation combines semi-supervised classification and rule-based inference in a collaborative web-based system that explicitly uses comparative genomics to simultaneously analyse groups of related genomes.

Alternative fuels and bioconversion

Oleaginous yeasts are capable of synthesizing lipids from different substrates other than glucose, and current research is attempting to understand this conversions with the goal of optimizing their throughput, production and quality. From a genomic standpoint the objective is to characterize genes involved in the biosynthesis of precursor molecules which will be transformed into fuels, which are thus not derived from petroleum. Biological experimentation by partner laboratories study lipid accumulation the oleaginous yeasts such as Yarrowia lipolyticastarting from:

pentoses, produced from lignin cellulose agricultural substrates following a biorefining strategy,

glycerol, a secondary output of chemical production of biodiesel, and

industrial residues.

Lipases from Y. lipolyticaare of particular interest (see for review). Experimental characterization of the lipid bodies produced from these substrates will aid in the identification of target genes which may serve for genetic engineering. This in turn requires the development of molecular tools for this class of yeasts with strong industrial potential. Magnome's focus is in acquiring genome sequences, predicting genes using models learned from genome comparison and sequencing of cDNA transcripts, and comparative annotation. Our overall goal is to define dynamic models that can be used to predict the behavior of modified strains and thus drive selection and genetic engineering.

Winemaking and improved strain selection

Yeasts and bacteria are essential for the winemaking process, and selection of strains based both on their efficiency and on the influence on the quality of wine is a subject of significant effort in the Aquitaine region. Unlike the species studied above, yeast and bacterial starters for winemaking cannot be genetically modified. In order to propose improved and more specialized starters, industrial producers use breeding and selection strategies.

Yeast starters from the Saccharomycesgenus are used for primary, alcohol fermentation. Recent advances have made it possible to identify the genetic causes of the different technological differences between strains , , . Manipulating the genetic causes rather than the industrial consequences is far more amenable to experimental development. An essential tool in identifying these genetic causes is comparative genomics.

Bacterial starters based on Oenococcus oeniare used in secondary, malolactic fermentation. Genetically, O. oenipresents a surprising level of intra-specific diversity, and clues that it may evolve more rapidly than expected. Studying the diversity of the O. oeni genomes has led to genetic tools that can be used to evaluate the predisposition of different strains to respond to oenological stresses. While identifying particular genes has been the leading strategy up to now, recently a new strategy based on comparative genomics has been undertaken to understand the impact and mechanisms of genetic diversity , , .

Starting from historical collaborations by Pascal Durrens and Elisabeth Bon with partners from the Institute for Wine and Vine Sciences in Bordeaux (ISVV), we have built an effective partnership between Magnome, the UMR Œnology–ISVV, and local industry, to apply our tools to large-scale comparative genomics of yeast and bacterial starters in winemaking.

Knowledge bases for molecular tools

Affinity binders are molecular tools for recognizing protein targets, that play a fundamental in proteomics and clinical diagnostics. Large catalogs of binders from competing technologies (antibodies, DNA/RNA aptamers, artificial scaffolds, etc.) and Europe has set itself the ambitious goal of establishing a comprehensive, characterized and standardized collection of specific binders directed against all individual human proteins, including variant forms and modifications. Despite the central importance of binders, they presently cover only a very small fraction of the proteome, and even though there are many antibodies against some targets (for example, >900 antibodies against p53), there are none against the vast majority of proteins. Moreover, widely accepted standards for binder characterization are virtually nonexistent.

Alongside the technical challenges in producing a comprehensive binder resource are significant logistical challenges, related to the variety of producers and the lack of reliable quality control mechanisms. As part of the ProteomeBinders and Affinomics projects, Magnomeworks to develop knowledge engineering techniques for storing, exploring, and exchanging experimental data used in affinity binder characterization. This work involves databases and tools for molecular interaction data , standards for data exchange between peers , , and reporting standards .

Software Inria Bioscience Resources Olivier Collin correspondant Frédéric Cazals Mireille Régnier Marie-France Sagot Hélène Touzet Hidde de Jong David Sherman Marie-Dominique Devignes Dominique Lavenier

Inria Bioscience Resources is a portal designed to improve the visibility of bioinformatics tools and resources developed by Inria teams. This portal will help the community of biologists and bioinformatians understand the variety of bioinformatics projects in Inria, test the different applications, and contact project-teams. Eight project-teams participate in the development of this portal. Inria Bioscience Resources is developed in an Inria Technology Development Action (ADT).

Magus: Collaborative Genome Annotation David James Sherman correspondant Pascal Durrens Natalia Golenetskaya Florian Lajus Tiphaine Martin

As part of our contribution the Génolevures Consortium, we have developed over the past few years an efficient set of tools for web-based collaborative annotation of eukaryote genomes. The Magusgenome annotation system integrates genome sequences and sequences features, in silicoanalyses, and views of external data resources into a familiar user interface requiring only a Web navigator. Magusimplements the annotation workflows and enforces curation standards to guarantee consistency and integrity. As a novel feature the system provides a workflow for simultaneous annotationof related genomes through the use of protein families identified by in silicoanalyses; this has resulted in a three-fold increase in curation speed, compared to one-at-a-time curation of individual genes. This allows us to maintain Génolevures standards of high-quality manual annotation while efficiently using the time of our volunteer curators.

Magusis built on: a standard sequence feature database, the Stein lab generic genome browser , various biomedical ontologies ( http:// obo. sf. net), and a web interface implementing a representational state transfer (REST) architecture .

For more information see magus.gforge.inria.fr, the MagusGforge web site. Magusis developed in an Inria Technology Development Action (ADT).

YAGA: Yeast Genome Annotation Pascal Durrens Tiphaine Martin correspondant

With the arrival of new generations of sequencers, laboratories, at a lower cost, can be sequenced groups of genomes. You can no longer manually annotate these genomes. The YAGA software's objective is to syntactically annotate a raw sequence (genetic element: gene, CDS, tRNA, centromere, gap, ...) and functionally as well as generate EMBL files for publication. The annotation takes into account data from comparative genomics, such as protein family profiles.

After determining the constraints of the annotation, the YAGA software can automatically annotate de novoall genomes from their raw sequences.The predictors used by the YAGA software can also take into account the data RNAseq to reinforce the prediction of genes.The current settings of the software are intended for annotation of the genomes of yeast, but the software is adaptable for all types of species.

BioRica: Multi-scale Stochastic Modeling David James Sherman correspondant Rodrigo Assar Cuevas Alice Garcia

BioRicais a high-level modeling framework integrating discrete and continuous multi-scale dynamics within the same semantics field. A model in BioRica node is hierarchically composed of nodes, which may be existing models. Individual nodes can be of two types:

Discrete nodes are composed of states, and transitions described by constrained events, which can be non deterministic. This captures a range of existing discrete formalisms (Petri nets, finite automata, etc.). Stochastic behavior can be added by associating the likelihood that an event fires when activated. Markov chains or Markov decision processes can be concisely described. Timed behavior is added by defining the delay between an event's activation and the moment that its transition occurs.

Continuous nodes are described by ODE systems, potentially a hybrid system whose internal state flows continuously while having discrete jumps.

The system has been implemented as a distributable software package

The BioRica compiler reads a specification for hierarchical model and compiles it into an executable simulator. The modeling language is a stochastic extension to the AltaRica Dataflow language, inspired by work of Antoine Rauzy. Input parsers for SBML 2 version 4 are curently being validated. The compiled code uses the Python runtime environment and can be run stand-alone on most systems .

For more information see biorica.gforge.inria.fr, the BioRica Gforge web site. BioRica was developed as an Inria Technology Development Action (ADT).

Pathtastic: Inference of whole-genome metabolic models David James Sherman correspondant Pascal Durrens Nicolás Loira Anna Zhukova

Pathtasticis a software tool for inferring whole-genome metabolic models for eukaryote cell factories. It is based on metabolic scaffolds, abstract descriptions of reactions and pathways on which inferred reactions are hung are are eventually connected by an interative mapping and specialization process. Scaffold fragments can be repeatedly used to build specialized subnetworks of the complete model.

Pathtastic uses a consensus procedure to infer reactions from complementary genome comparisons, and an algebra for assisted manual editing of pathways.

For more information see pathtastic.gforge.inria.fr, the Pathtastic Gforge web site.

Génolevures On Line: Comparative Genomics of Yeasts David James Sherman Pascal Durrens correspondant Natalia Golenetskaya Tiphaine Martin

The Génolevures online database provides tools and data for exploring the annotated genome sequences of more than 20 genomes, determined and manually annotated by the Génolevures Consortium to facilitate comparative genomic studies of hemiascomycetous yeasts. Data are presented with a focus on relations between genes and genomes: conservation of genes and gene families, speciation, chromosomal reorganization and synteny. The Génolevures site includes an area for specific studies by members of its international community.

Génolevures online uses the Magussystem for genome navigation, with project-specific extensions developed by David Sherman, Pascal Durrens, and Tiphaine Martin. An advanced query system for data mining in Génolevures is being developed by Natalia Golenetskaya. The contents of the knowledge base are expanded and maintained by the CNRS through GDR 2354 Génolevures. Technical support for Génolevures On Line is provided the CNRS through UMR 5800 LaBRI.

For more information see genolevures.org, the Génolevures web site.

New Results Yeast comparative genomics David James Sherman Pascal Durrens correspondant Tiphaine Martin Nicolás Loira

By using the MAGNOME software developments, including the Magussystem and YAGA software, we have successfully realized a full annotation and analysis of seven new genomes, provided to the Génolevures Consortium by the CEA–Génoscope (Évry). Two distant genomes from the Debaryomycetaceaeand mitosporic Saccharomycetalesclades of the Saccharomycetaleswere annotated using previously published Génolevures genomes , , as references. A further group of five species, comprised of pathogenic and nonpathogenic species, was analyzed with the goal of identifying virulence determinants . By choosing species that are highly related but which differ in the particular traits that are targeted, in this case pathogenicity, we are able to focus of the few hundred genes related to the trait. The approximately 40,000 new genes from these studies were classified into existing Génolevures families as well as branch-specific families. The results from these two studies will be published in the coming year.

Assembly, annotation and comparison of Oenococcusstrains David James Sherman Pascal Durrens Elisabeth Bon correspondant Tiphaine Martin Aurélie Goulielmakis

Oenococcus oeniis part of the natural microflora of wine and related environments, and is the main agent of the malolactic fermentation (MLF), a step of wine making that generally follows alcoholic fermentation (AF) and contributes to wine deacidification, improvement of sensorial properties and microbial stability. The start, duration and achievement of MLF are unpredictable since they depend both on the wine characteristics and on the properties of the O. oenistrains. In collaboration with Patrick Lucas’s lab of the ISVV Bordeaux that is currently proceeding with genome sequencing, explorative and, and comparative genomics, Elisabeth Bon coordinates our efforts into the OENIKITA project (since 2009), a scale switching challenge including highthrouput exploratory and comparative genomics for oenological bacterial starters, and the development of an online web-collaborative multigenomic comparative platform (under development) based on the the Génolevures database architecture and MAGUS / YAGA systems.

OENI-Genomics axis: In comparative genomics, we investigated gene repertoire and genomic organization conservation through intra- and inter-species genomic comparisons, which clearly show that the O. oenigenome is highly plastic and fast-evolving. Results reveal that the optimal adaptation to wine of a strain mostly depends on the presence of key adaptive loops and polymorphic genes. They also point up the role of horizontal gene transfer and mobile genetic elements in O. oenigenome plasticity, and give the first clues of the genetic origin of its oenological aptitudes. As a result of the scaling out challenge, we completed the assembly of 19 fully sequenced O. oenigenome variants.

KITA-Genomics (E. Bon, D. Sherman): This project that is focused on the sequencing, assembly, exploration and comparison of the O. kitaharae genome, has benefited to an international collaboration involving Dr V. Makeev. MAGNOME contributed to the assembly of the genome. The comparison against the O. oenigenomic architecture will contribute to shed light on the evolutionary mechanisms which are responsible for the atypically long branch of the genus Oenococcus in phylogenetic trees.

Transcriptomic axis (E. Bon, A. Goulielmakis): Under the supervision of E. Bon, Aurélie Goulielmakis has completed for the ANR DIVOENI a detailed manual annotation of a new reference strain of O. oeniand performed comparative transcriptome analysis to identify genes differentially expressed under different culture conditions. We explored and compared how the expression system is solicited when O. oenistrains adapted to grow in some niches are placed under stress-exposure conditions. The monitoring of gene expression status between strains, through the definition of a global expression pattern proper to each gene, partially lift the veil on how O. oenigenome adapts function to its environment. The weight of genetic background and ecological niche pressure on gene expression flexibility was evaluated, and the O. oenipan-transcriptome architecture characterized. The first guidelines revealed a supra-spatial organization of stress response into activated and repressed larger macro-domains defining functional landmarks and intra-chromosomal territories . Decryption of stress-sensitive gene repertoires promises to be an efficient tool in the conquest of O. oeni“domestication” through the identification of molecular markers responsible for different physiological capabilities, and the selection of the best adapted strains.

Gene plasticity modelisation (E. Bon, A. Goulielmakis): A novel axis of research recently emerged under the initiative of E. Bon (pseudOE project) around the detection, characterization and conservation of pseudogenes populations in Oenococcus bacteria. Such topic presents a double interest: phylogenetic at first because it should allow to better estimate the degree of genic/genomic plasticity of these bacteria, and algorithmic then because the pseudogenes are a source of confusion for the automatic prediction of genes. Through a transversal collaboration and a cooperative supervision with the Algorithms for Analysis of Biological Structures Group (P. Ferraro, J. Allali) at LaBRI, Laetitia Bourgeade (PhD, Univ. Bordeaux1) was recruited to develop dedicated methods to improve pseudogenes automatic detection, and therefore gene predictions, and to reconstruct fossil and modern genes evolutionary history.

Scaling-out David James Sherman correspondant Pascal Durrens Tiphaine Martin Natalia Golenetskaya Florian Lajus

The Tsvetok project in Magnometargets “scaling out” for data and computation, both to improve capacity for handling large volumes of data and to permit more automatic analysis of projects of the “comparative genomics of related species” type, where a set of genomes is sequenced and analyzed as part of the same process. Natalia Golenetskaya has designed and implemented a NoSQL schema through the identification of standard queries, definition of the appropriate query-oriented storage schema, and mapping of structured values to this schema. This prototype is being tested on an Apache Cassandra ring deployed in Magnome's dedicated computing cluster.

Large-scale data-mining such as that required for comparative genomics is fundamentally data-parallel: an initial transformation is applied to every data object of a given type (such as genes or even individual nucleotides), then a statistical machine learning procedure is applied to the transformed data to produce a summary or to learn a classification function. Analyses of this kind are the design goal of the MapReduce paradigm . Using Tsvetok as a generator for Apache Hadoop, Natalia is designing MapReduce solutions for the principal whole-genome and data-mining analyses used by Magnomefor eukaryote and prokaryote comparative genomics.

Affinity Proteomics: Standards for affinity binders David James Sherman correspondant Natalia Golenetskaya

Last year we successfully completed and released the MIAPAR and PSI-PAR international standards for knowledge representation and data exchange of affinity binder properties, a five-year effort organized as part of the ProteomeBinders and HUPO-PSI consortia. These standards were reported in Nature Biotechnologyand Molecular and Cellular Proteomicsto the research community , and extend previous work , . One long-standing issue is the adoption of these standards by individual researchers in the lab: initial data entry must be simple enough that standards-based reporting can be integrated into the process of writing the paper. We used an extensive dataset of affinity proteomics data to evaluate “last mile” tools for data entry and initial reporting of affinity proteomics data, and identified places where existing tools need to be adapted to meet these specific needs .

Inferring metabolic models David James Sherman correspondant Pascal Durrens Tiphaine Martin Nicolás Loira Anna Zhukova

In collaboration with Prof Jean-Marc Nicaud's lab at the INRA Grignon, we developed the first functional genome-scale metabolic model of an oleaginous yeast. Most work in producing genome-scale metabolic models has focused on model organisms, in part due to the cost of obtaining well-annotated genome sequences and sufficiently complete experimental data for refining and verifying the models. However, for many fungal genomes of biotechnological interest, the combination of large-scale sequencing projects and in-depth experimental studies has made it feasible to undertake metabolic network reconstruction for a wider range of organisms.

An excellent representative of this new class of organisms is Yarrowia lipolytica, an oleaginous yeast studied experimentally for its role as a food contaminant and its use in bioremediation and cell factory applications. As one of the hemiascomycetous yeasts completely sequenced in the Génolevures program it enjoys a high quality manual annotation by a network of experts. It is also an ideal subject for studying the role of species-specific expansion of paralogous families, a considerable challenge for eukaryotes in genome-scale metabolic construction. To these ends, we undertook a complete reconstruction of the Y. lipolyticametabolic network.

Methods: A draft model was extrapolated from the S. cerevisiaemodel iIN800, using in silicomethods including enzyme conservation predicted using Génolevures and reaction mapping maintaining compartments. This draft was curated by a group of experts in Y. lipolyticametabolism, and iteratively improved and validated through comparison with experimental data by flux balance analysis. Gap filling, species-specific reactions, and the addition of compartments with the corresponding transport reactions were among the improvements that most affected accuracy.

Results: We produced an accurate functional model for Y. lipolytica, MODEL1111190000 in Biomodels.net, that has been qualitatively validated against gene knockouts.

Hierarchical modeling with BioRica David James Sherman correspondant Tiphaine Martin Alice Garcia Rodrigo Assar-Cuevas Nicolás Loira

A recurring challenge for in silicomodeling of cell behavior is that experimentally validated models are so focused in scope that it is difficult to repurpose them. Hierarchical modeling is one way of combining specific models into networks. Effective use of hierarchical models requires both formal definition of the semantics of such composition, and efficient simulation tools for exploring the large space of complex behaviors.

BioRica is a high-level hierarchical modeling framework for models combining continuous and discrete components. By providing a reliable and functional software tool backed by a rigorous semantics, we hope to advance real adoption of hierarchical modeling by the systems biology community. By providing an understandable and mathematically rigorous semantics, this will make is easier for practicing scientists to build practical and functional models of the systems they are studying, and concentrate their efforts on the system rather than on the tool.

Rodrigo Assar formalized two strategies for integrating discrete control with continuous models, coefficient switches that control the parameters of the continuous model, and strong switches that choose different models. This was translated by Alice Garcia into a BioRica specification for hybrid systems that assures integrity of models, allowing composition, reconciliation, and reuse of models with SBML specifications. Rodrigo used this approach to describe two systems: wine fermentation kinetics, and cell fate decisions leading to bone and fat formation . In the first, known models that describe the responses of yeast cells to different temperatures, resources and toxins, were reconciled using coefficient switches that gave the best adjustment of the model depending on the initial conditions and fermentation variable. In the second, a combination of accurate models to predict the bone and fat formation in response to activation of pathways such as the Wnt pathway, and changes of conditions affecting these functions such as increments in Homocysteine, were used to analyze the responses to treatments for osteoporosis and other bone mass disorders. Our hope is that this is a first step in obtaining in silicoevaluations of medical treatments before testing them in vivoor in vitro.

Maria Llubères of the University of Puerto Rico visited Magnomeand we established formal relationships between BioRica models and probabilistic boolean networks.

Contracts and Grants with Industry Contracts with Industry

SARCO, the research subsidiary of the Laffort group, has entered into a contract with Magnometo develop comparative genomics tools for selecting wine starters. This contract will permit SARCO to take a decisive step in the understanding of oenological microorganisms by obtaining and exploiting the sequences of their genomes. Comparison of the genomes of these strains has become absolutely necessary for learning the genetic origin of the phenotypic variations of oenological yeasts and bacteria. This knowledge will permit SARCO to optimize and accelerate the process of selection of the highest-performing natural strains. With the help of Magnomemembers and their rich experience in comparative analysis of related genomes, SARCO will acquire competence in biological analysis of genomic sequences. At the same time, Magnomemembers will acquire further experience with the genomes of winemaking microorganisms, which will help us define new tools and methods better adapted to this kind of industrial cell factory.

Grants with Industry

The French Petroleum Institute ( Institut français de pétrole-énergies nouvelles) is coordinating a 6 M-Euro contract with the Civil Aviation Directorate ( Direction Générale de l'Aviation Civile) on behalf of a large consortium of industrial (EADS, Dassault, Snecma, Turbomeca, Airbus, Air France, Total) and academic (CNRS, INRA, Inria) partners to explore different technologies for alternative fuels for aviation. The CAER project studies both biofuel products and production, improved jet engine design, and the impact of aircraft. Within CAER Magnome via CNRS, works with partners from Grignon and Toulouse on the genomics of highly-performant oleaginous yeasts.

Partnerships and Cooperations Regional Initiatives Aquitaine Region “SAGÉSS” comparative genomics for wine starters David James Sherman correspondant Pascal Durrens Elisabeth Bon Tiphaine Martin Nicolás Loira

This project is a collaboration between the company SARCO, specialized in the selection of industrial yeasts with distinct technological abilities, with the ISVV and Magnome. The goal is to use genome analysis to identify molecular markers responsible for different physiological capabilities, as a tool for selecting yeasts and bacteria for wine fermentation through efficient hybridization and selection strategies. This collaboration has obtained the INNOVIN label.

Aquitaine Region “Oenophages: comparative genomics for oenococcus bacteriophages” (2011-2014) David James Sherman correspondant Elisabeth Bon National Initiatives ANR DIVOENI, 2008-2012 Elisabeth Bon correspondant Aurélie Goulielmakis

Elisabeth Bon is a partner in DIVOENI, a four-year ANR project concerning intraspecies biodiversity of the oenological bacteria Oenococcus oeni. Coordinated by Prof. Aline Lonvaud (Univ. Bordeaux Segalen) from the Institute of Vine and Wine Sciences of Bordeaux – Aquitaine, this scientific programme was developed:

To evaluate the genetic diversity of a vast collection of strains, to set up phylogenetic groups, then to investigate relationships between the ecological niches (cider, wine, champagne) and the essential phenotypical traits. Hypotheses on the evolution in the species and on the genetic stability of strains will be drawn.

To propose methods based on molecular markers to make a better use of the diversity of the species.

To measure the impact of the repeated use of selected strains on the diversity in the ecosystem and to draw the conclusions for its preservation.

Elisabeth is in charge of the computational infrastructure dedicated to genomics and post-genomics data storage, handling and analysis. She coordinates collaboration with the CBiB-Centre de Bioinformatique de Bordeaux.

European Initiatives Affinity Proteomics (FP7) David James Sherman correspondant Natalia Golenetskaya

A major objective of the “post-genome” era is to detect, quantify and characterise all relevant human proteins in tissues and fluids in health and disease. This effort requires a comprehensive, characterised and standardised collection of specific ligand binding reagents, including antibodies, the most widely used such reagents, as well as novel protein scaffolds and nucleic acid aptamers. Currently there is no pan-European platform to coordinate systematic development, resource management and quality control for these important reagents.

Magnomeis an associate partner of the FP7 “ Affinity Proteome” project coordinated by Prof. Mike Taussig of the Babraham Institute and Cambridge University. Within the consortium, we participate in defining community for data representation and exchange, and evaluate knowledge engineering tools for affinity proteomics data.

Sustained Collaborations with Major European Organization

Prof. Mike Taussig: Babraham Institute & Cambridge University

Knowledge engineering for Affinity Proteomics

Henning Hermjakob: European Bioinformatics Institute

Standards and databases for molecular interactions

International Initiatives Visits of International Scientists Vsevolod Makeev Artëm Kasianov Marie Llubères

Vsevolod Makeev (Senior Researcher at the Russian Academy of Sciences, Vavilov Institute) has been a collaborator for several years. He and his student Artëm Kasianov made several visits to Inria in 2011, and worked with us on genome assembly algorithms, computational identification of sequence motifs, and distributed algorithms for data mining. Vsevolod Makeev was a visiting CNRS Senior Researcher in the LaBRI and Magnomefor three months in the Fall of 2011.

Marie Llubères visited Magnomefrom the University of Puerto Rico for two months on a grant from the NSF PIRE program. She worked on hierarchical modeling of biological systems and specifically on bijections between Probabilistic Boolean Networks and the Stochastic Transition Systems used in the BioRica framework.

Internship Hugo Campbell Sills

Hugo Campbell Sills came to Magnomeon an Inria International Internship in the Summer of 2011, and worked on single-nucleotide polymorphism discovery and effects in twelve œnological yeast genomes.

Participation In International Programs Génolevures and Dikaryome Consortia David James Sherman correspondant Pascal Durrens Tiphaine Martin Nicolás Loira Anasua Sarkar Anna Zhukova Florian Lajus

Since 2000 our team is a member of the Génolevures Consortium (GDR CNRS), a large-scale comparative genomics project that aims to address fundamental questions of molecular evolution through the sequencing and the comparison of 14 species of hemiascomycetous yeasts. The Consortium is comprised of 16 partners, in France, Belgium, Spain, the Netherlands (see http:// genolevures. org/ ). Within the Consortium, our team is responsible for bioinformatics, for research in new methods of analysis. Pascal Durrens and Tiphaine Martin of the CNRS are responsible for the development of resources for exploiting comparative genomic data. Pascal Durrens is the editorial manager of the Génolevures on-line resource.

The Dikaryome Consortium is a scientific collaboration between several international partners and the National Center for Sequencing (CEA–Génoscope, Évry) on the sequencing, annotation, and comparative analysis of fungal genomes.

These perennial collaborations continue in two ways. First, a number of new projects are underway, concerning several new genomes currently being sequenced, and new questions about the mechanisms of gene formation. Second, through the development and improvement of the Génolevures On Line database, in whose maintenance our team has a longstanding committment.

Dissemination Animation of the scientific community

David Sherman is member of the editorial board of the journal Computational and Mathematical Methods in Medicine, and reviewer for several in the bioinformatics field.

David Sherman was external reviewer and member of the thesis defense jury for Anne-Ruxandra Carvunis, Grenoble. He was a member and president of the jury for the thesis defense of Anne-Laure Gaillard, Bordeaux. He was thesis director and member of the jury for the thesis defense of Rodrigo Assar. He was a member of the HDR defense jury for Patrick Lucas, Bordeaux.

Pascal Durrens is responsible for scientific diffusion, and David Sherman is head of Bioinformatics, for the Génolevures Consortium.

Pascal Durrens is leader of the “Comparative Genomics” theme and member of the Scientific Council of the LaBRI UMR 5800/CNRS

Tiphaine Martin is member of the Local Committee and member of Local Committee for Occupational Health and Safety of the INRIA Bordeaux Sud-Ouest.

Tiphaine Martin is member of the GIS-IBiSA GRISBI-Bioinformatics Grid working group.

Tiphaine Martin and David Sherman are members of the Institut de Grilles, and Tiphaine is active in the Biology/Health working group.

Elisabeth Bon is member of the “Comité Technique Paritaire” (2008-2011) and the “Comité Technique de Proximité” (since 2011-10-20) at the Segalen Bordeaux University

Teaching

David Sherman and Natalia Golenetskaya teach :

Master : Web et Interfaces Homme-Machine, 50h, 2ème année Ingénieur, Enseirb-Matmeca (Institut Polytechnique de Bordeaux), Bordeaux

Tiphaine Martin teaches :

Master : Utilisation of EGEE GRID via virtual organisation GRISBI , 8h, niveau (M2), University Lyon, France

Master : Utilisation of EGEE GRID via virtual organisation GRISBI, 8h, niveau (M2), INRA Toulouse, France

Doctorat : Utilisation of MAGUS software, 8h, Génolevures Consortium, France

Tiphaine Martin has the supervision of 4 Bioinformatics MSc students from the University of Bordeaux:

Master : Development of search tools on Génolevures databases, 6hETC, M1, University Bordeaux 1 and University Bordeaux Segalen, France

Elisabeth Bon is Associate Professor in Bioinformatics and Genomics, and teaches undergraduate courses in computer sciences in regular STS (Sciences, Technologies & Sante) bachelor’s degrees and research oriented STS master’s degrees at the Life Sciences Department of the University Bordeaux Segalen (Medicine and Life Sciences schools) and University Bordeaux 1 (Computer and Life Science schools).

Licence : “Introduction to ICTs-Information & Communication Technologies” class (basic and advanced sections) , 112H00, niveau (L1, L2), the STS- biology variant Licence program, université, France

Licence : the national “IT and Internet certificate (C2i®), level 1”, 20h, niveau (L2, L3),the STS- biology variant Licence program, université, France

Master : “Bioinformatics: Computerised resources, data banks and methods”, 60h, niveau (M1),the Biology & Healthcare STS- Master program, co-listed with the University Bordeaux 1 (Sciences & Technologies) and the University Bordeaux Segalen (Medicine & Life Sciences), France

Elisabeth Bon is :

Licence : Responsible for the bachelor’s degree “Information Technologies & Internet advanced course”, Life Sciences Department, University Bordeaux, France

Licence : Responsible for the “IT and Internet certificate (C2i®), level 1”, Life Sciences Department, University Bordeaux , France

Licence : Current president (2005-2007; since sept. 2011) of the “Segalen Bordeaux University IT and Internet certificate (C2i, level 1) committee” in charge of the C2i evaluation and certification for students and continuous education interns, University Bordeaux Segalen, France

Master : Master Theses in Computer Science, speciality in BioInformatics: L. Bourgeade (2011-02-01 / 2011-08-31), Reconstitution in silico de l’histoire évolutive des pseudogènes chez les bactéries lactiques du genre Oenococcus, M2, University Bordeaux 1, France

Doctorat : Ph.D. Thesis in Computer Science: L. BOURGEADE (Since 2011-10-01) in cooperation with P. Ferraro and J. Allali at LaBRI, Filtres sur les arborescences modélisant les ARN et plasticité génique, University Bordeaux 1, France

Doctorat : ATER (assistant professor) in computer sciences for ITCs practical workshops and courses in the first year of the Bachelor’s degre, University Bordeaux 1, France

PhD & HdR :

PhD: Rodrigo Assar, Modeling and simulation of Hybrid Systems and Cell factory applications, University Bordeaux 1, defended 2011-10-26, David Sherman

PhD in progress: Nicolás Loira, University Bordeaux 1, Scaffold-based Reconstruction Method for Genome-Scale Metabolic Models, began 2007, David Sherman

PhD in progress: Natalia Golenetskaya, University Bordeaux 1, began 2009, Scaling out for data in comparative genomics, Pascal Durrens and David Sherman

PhD in progress : Razanne Issa, University Bordeaux 1, Analyse symbolique de données génomiques, began 2010, David Sherman

PhD in progress: Anna Zhukova, University Bordeaux 1, Comparative genomics of biotechnological organisms, began 2011, David Sherman

PhD in progress: Anasua Sarkar, University Bordeaux 1, began 2008, Macha Nikolski

PhD in progress: Laetitia Bourgeade, University Bordeaux 1, Filtres sur les arborescences modélisant les ARN et plasticité génique, began, Pascal Ferarro and Elisabeth Bon

Genomic Exploration of the Hemiascomycetous Yeasts: 4. The genome of Saccharomyces cerevisiaerevisited Gaelle Blandin G. Pascal Durrens P. Fredj Tekaia F. Michel Aigle M. Monique Bolotin-Fukuhara M. Elisabeth Bon E. Serge Casaregola S. Jacky de Montigny J. Claude Gaillardin C. Audrey Lépingle A. B. Llorente B. Alain Malpertuy A. Cécile Neuvéglise C. Odile Ozier-Kalogeropoulos O. A. Perrin A. Serge Potier S. Jean-Luc Souciet J.-L. Emmanuel Talla E. Claire Toffano-Nioche C. Micheline Wésolowski-Louvel M. Christian Marck C. Bernard Dujon B. FEBS Letters 487 1 December 2000 31-36 Minimum information about a protein affinity reagent (MIAPAR). Julie Bourbeillon J. Sandra Orchard S. Itai Benhar I. Carl Borrebaeck C. Antoine De Daruvar A. Stefan Dübel S. Ronald Frank R. Frank Gibson F. David Gloriam D. Niall Haslam N. Tara Hiltker T. Ian Humphrey-Smith I. Michael Hust M. David Juncker D. Manfred Koegl M. Zoltán Konthur Z. Bernhard Korn B. Sylvia Krobitsch S. Serge Muyldermans S. Per-Ake Nygren P.-A. Sandrine Palcy S. Bojan Polic B. Henry Rodriguez H. Alan Sawyer A. Martin Schlapshy M. Michael Snyder M. Oda Stoevesandt O. Michael J Taussig M. J. Markus Templin M. Matthias Uhlen M. Silvère Van Der Maarel S. Christer Wingren C. Henning Hermjakob H. David James Sherman D. J. Nature Biotechnology 28 7 07 2010 650-3 http:// hal. inria. fr/ inria-00544750/ en DE GB CA US BE Comparative genomics of protoploid Saccharomycetaceae. Jean-Luc Souciet J.-L. Bernard Dujon B. Claude Gaillardin C. Mark Johnston M. Philippe V Baret P. V. Paul Cliften P. David James Sherman D. J. Jean Weissenbach J. Eric Westhof E. Patrick Wincker P. Claire Jubin C. Julie Poulain J. Valérie Barbe V. Béatrice Ségurens B. Francois Artiguenave F. Véronique Anthouard V. Benoit Vacherie B. Marie-Eve Val M.-E. Robert S Fulton R. S. Patrick Minx P. Richard Wilson R. Pascal Durrens P. Géraldine Jean G. Christian Marck C. Tiphaine Martin T. Macha Nikolski M. Thomas Rolland T. Marie-Line Seret M.-L. Serge Casaregola S. Laurence Despons L. Cécile Fairhead C. Gilles Fischer G. Ingrid Lafontaine I. Veronique Leh V. Marc Lemaire M. Jacky De Montigny J. Cécile Neuvéglise C. Agnès Thierry A. Isabelle Blanc-Lenfle I. Claudine Bleykasten C. Julie Diffels J. Emilie Fritsch E. Lionel Frangeul L. Adrien Goëffon A. Nicolas Jauniaux N. Rym Kachouri-Lafond R. Célia Payen C. Serge Potier S. Lenka Pribylova L. Christophe Ozanne C. Guy-Franck Richard G.-F. Christine Sacerdot C. Marie-Laure Straub M.-L. Emmanuel Talla E. Genome Research 19 2009 1696-1709 http:// hal. inria. fr/ inria-00407511/ en/ US BE How to decide which are the most pertinent overly-represented features during gene set enrichment analysis Roland Barriot R. David James Sherman D. J. Isabelle Dutour I. BMC Bioinformatics 8 2007 http:// hal. inria. fr/ inria-00202721/ en/ Integrated multilaboratory systems biology reveals differences in protein metabolism between two reference yeast strains. André B Canelas A. B. Nicola Harrison N. Alessandro Fazio A. Jie Zhang J. Juha-Pekka Pitkänen J.-P. Joost Van Den Brink J. Barbara M Bakker B. M. Lara Bogner L. Jildau Bouwman J. Juan I Castrillo J. I. Ayca Cankorur A. Pramote Chumnanpuen P. Pascale Daran-Lapujade P. Duygu Dikicioglu D. Karen Van Eunen K. Jennifer C Ewald J. C. Joseph J Heijnen J. J. Betul Kirdar B. Ismo Mattila I. Femke I C Mensonides F. I. C. Anja Niebel A. Merja Penttilä M. Jack T Pronk J. T. Matthias Reuss M. Laura Salusjärvi L. Uwe Sauer U. David James Sherman D. J. Martin Siemann-Herzberg M. Hans Westerhoff H. Johannes De Winde J. Dina Petranovic D. Stephen G Oliver S. G. Christopher T Workman C. T. Nicola Zamboni N. Jens Nielsen J. Nature Communications 1 9 12 2010 145 http:// hal. inria. fr/ inria-00562005/ en/ UK ND DE CH TR SE Genome evolution in yeasts Bernard Dujon B. David James Sherman D. J. Gilles Fischer G. Pascal Durrens P. Serge Casaregola S. Ingrid Lafontaine I. Jacky De Montigny J. Christian Marck C. Cécile Neuvéglise C. Emmanuel Talla E. Nicolas Goffard N. Lionel Frangeul L. Michel Aigle M. Véronique Anthouard V. Anna Babour A. Valérie Barbe V. Stéphanie Barnay S. Sylvie Blanchin S. Jean-Marie Beckerich J.-M. Emmanuelle Beyne E. Claudine Bleykasten C. Anita Boisramé A. Jeanne Boyer J. Laurence Cattolico L. Fabrice Confanioleri F. Antoine De Daruvar A. Laurence Despons L. Emmanuelle Fabre E. Cécile Fairhead C. Hélène Ferry-Dumazet H. Alexis Groppi A. Florence Hantraye F. Christophe Hennequin C. Nicolas Jauniaux N. Philippe Joyet P. Rym Kachouri-Lafond R. Alix Kerrest A. Romain Koszul R. Marc Lemaire M. Isabelle Lesur I. Laurence Ma L. Héloïse Muller H. Jean-Marc Nicaud J.-M. Macha Nikolski M. Sophie Oztas S. Odile Ozier-Kalogeropoulos O. Stefan Pellenz S. Serge Potier S. Guy-Franck Richard G.-F. Marie-Laure Straub M.-L. Audrey Suleau A. Dominique Swennen D. Fredj Tekaia F. Micheline Wésolowski-Louvel M. Eric Westhof E. Bénédicte Wirth B. Maria Zeniou-Meyer M. Ivan Zivanovic I. Monique Bolotin-Fukuhara M. Agnès Thierry A. Christiane Bouchier C. Bernard Caudron B. Claude Scarpelli C. Claude Gaillardin C. Jean Weissenbach J. Patrick Wincker P. Jean-Luc Souciet J.-L. Nature 430 6995 07 2004 35-44 http:// hal. archives-ouvertes. fr/ hal-00104411/ en/ Fusion and fission of genes define a metric between fungal genomes. Pascal Durrens P. Macha Nikolski M. David James Sherman D. J. PLoS Computational Biology 4 10 2008 e1000200 http:// hal. inria. fr/ inria-00341569/ en/ An Efficient Probabilistic Population-Based Descent for the Median Genome Problem Adrien Goëffon A. Macha Nikolski M. David James Sherman D. J. Proceedings of the 10th annual ACM SIGEVO conference on Genetic and evolutionary computation (GECCO 2008) Atlanta United States ACM 2008 315-322 http:// hal. archives-ouvertes. fr/ hal-00341672/ en/ Family relationships: should consensus reign?- consensus clustering for protein families Macha Nikolski M. David James Sherman D. J. Bioinformatics 23 2007 e71–e76 http:// hal. inria. fr/ inria-00202434/ en/ Genolevures: protein families and synteny among complete hemiascomycetous yeast proteomes and genomes. David James Sherman D. J. Tiphaine Martin T. Macha Nikolski M. Cyril Cayla C. Jean-Luc Souciet J.-L. Pascal Durrens P. Nucleic Acids Research (NAR) 2009 D550-D554 http:// hal. inria. fr/ inria-00341578/ en/ Modeling and simulation of Hybrid Systems and Cell factory applications Rodrigo Assar R. Université Sciences et Technologies - Bordeaux I October 2011 http:// hal. inria. fr/ tel-00635273/ en Ph. D. Thesis The Génolevures database Pascal Durrens P. Tiphaine Martin T. David James Sherman D. J. 1631-0691 Comptes Rendus de l'Académie des Sciences, Série Biologies 2011 http:// hal. inria. fr/ inria-00539200/ en Rapid Discrimination between Candida glabrata, Candida nivariensis, and Candida bracarensis by Use of a Singleplex PCR Adela Enache-Angoulvant A. Juliette Guitard J. Frédéric Grenouillet F. Tiphaine Martin T. Pascal Durrens P. Cécile Fairhead C. Christophe Hennequin C. 0095-1137 Journal of Clinical Microbiology 49 9 September 2011 3375-3379 http:// hal. inria. fr/ inria-00625115/ en Modeling Stochastic Switched Systems with BioRica Rodrigo Assar R. Alice Garcia A. David James Sherman D. J. Journées Ouvertes en Biologie, Informatique et Mathématiques JOBIM 2011 Paris, France Institut Pasteur July 2011 297–304 http:// hal. inria. fr/ inria-00617419/ en Journées Ouvertes Biologie Informatique Mathématiques 12 JOBIM RENABI GRISBI Infrastructure Distribuée pour la Bioinformatique Christophe Blanchet C. Clément Gauthey C. Christophe Caron C. Olivier Collin O. Stephane Delmotte S. Tiphaine Martin T. Aurelien Roult A. Franck Samson F. Bruno Spataro B. JOBIM 2011 - Journées Ouvertes Biologie Informatique Mathématique Paris, France June 2011 http:// hal. inria. fr/ hal-00640007/ en Journées Ouvertes Biologie Informatique Mathématiques 12 JOBIM Assessing ”last mile” tools for affinity binder databases Natalia Golenetskaya N. David James Sherman D. J. 5th ESF Workshop on Affinity Proteomics: Ligand Binders against the Human Proteome Alpbach, Austria March 2011 http:// hal. inria. fr/ hal-00653518/ en ESF Workshop on Affinity Proteomics : Ligand Binders against the Human Proteome 5 Affinomics Rethinking global analyses and algorithms for comparative genomics in a functional MapReduce style Natalia Golenetskaya N. David James Sherman D. J. Algorithmique, combinatoire du texte et applications en bio-informatique (SeqBio 2011) Lille, France December 2011 http:// hal. inria. fr/ hal-00654797/ en Algorithmique, combinatoire du texte et applications en bio-informatique 2011 SeqBio How does Oenococcus oeni adapt to its environment? A pangenomic oligonucleotide microarray for analysis O. oeni gene expression under wine shock. Aurélie Goulielmakis A. Julen Bridier J. Aurélien Barré A. Olivier Claisse O. David James Sherman D. J. Pascal Durrens P. Aline Lonvaud-Funel A. Elisabeth Bon E. OENO2011- 9th International Symposium of Oenology Bordeaux, France Dunod, Paris January 2011 http:// hal. inria. fr/ hal-00646867/ en International Symposium of Oenology 9 OENO Génolevures: Policy for Automated Annotation of Genome Sequences Tiphaine Martin T. Pascal Durrens P. JOBIM 2011 Paris, France June 2011 http:// hal. inria. fr/ inria-00614485/ en Journées Ouvertes Biologie Informatique Mathématiques 12 JOBIM Un polymorphisme suspect Tiphaine Martin T. Pascal Durrens P. Unithé ou Café Talence, France June 2011 http:// hal. inria. fr/ inria-00614484/ en Inria Unithé ou Café 2011 Genolevures: automated annotation of yeast genome sequences Tiphaine Martin T. Comparative Genomics of Eukaryotic Microorganisms Sant Feliu de Guixols, Spain October 2011 http:// hal. inria. fr/ hal-00640571/ en Symposium on Comparative Genomics of Eukaryotic Microorganisms 2011 ESF-EMBO Genolevures : automated annotation of yeast genome sequences Tiphaine Martin T. David James Sherman D. J. Pascal Durrens P. Comparative Genomics of Eukaryotic Microorganisms Sant Feliu de Guixols, Spain October 2011 http:// hal. inria. fr/ hal-00640575/ en Symposium on Comparative Genomics of Eukaryotic Microorganisms 2011 ESF-EMBO Comparative annotation and scaling-out challenges for paraphyletic strategies David James Sherman D. J. Natalia Golenetskaya N. Tiphaine Martin T. Pascal Durrens P. EMBO Symposium on Comparative Genomics of Eukaryotic Microorganisms: Understanding the Complexity of Diversity San Feliu de Guixols, Spain EMBO October 2011 http:// hal. inria. fr/ hal-00652903/ en Symposium on Comparative Genomics of Eukaryotic Microorganisms 2011 ESF-EMBO Addressing scaling-out challenges for comparative genomics David James Sherman D. J. Natalia Golenetskaya N. Moscow Conference on Computational Molecular Biology Moscow, Russian Federation July 2011 http:// hal. inria. fr/ hal-00649189/ en Moscow Conference on Computational Molecular Biology 2011 MCCMB Reconciling competing models: a case study of wine fermentation kinetics Rodrigo Assar R. Felipe Vargas F. David James Sherman D. J. Katsuhisa Horimoto K. Masahiko Nakatsui M. Nikolaj Popov N. Algebraic and Numeric Biology 2010 Austria Hagenberg Research Institute for Symbolic Computation, Johannes Kepler University of Linz 08 2010 68–83 http:// hal. inria. fr/ inria-00541215/ en CL Finding functional features in Saccharomycesgenomes by phylogenetic footprinting Paul Cliften P. P. Sudarsanam P. A. Desikan A. L. Fulton L. Robert S Fulton R. S. J. Majors J. R. Waterston R. B. A. Cohen B. A. Mark Johnston M. Science 301 2003 71–76 MapReduce: simplified data processing on large clusters J Dean J. S Ghemawat S. Proceedings of the 6th conference on Symposium on Operating Systems Design and Implementation (OSDI'04) San Francisco, CA 2004 The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome F. S. Dietrich F. S. et al. Science 304 2004 304-7 The lipases from Yarrowia lipolytica: genetics, production, regulation, biochemical characterization and biotechnological applications P. Fickers P. A. Marty A. Jean-Marc Nicaud J.-M. Biotechnol Adv. 29 6 Nov-Dec 2011 632–44 Principled design of the modern Web architecture R. Fielding R. R.N. Taylor R. ACM Trans. Internet Technol. 2 2002 115–150 Mixed-formalism hierarchical modeling and simulation with BioRica Alice Garcia A. David James Sherman D. J. 11th International Conference on Systems Biology (ICSB 2010) United Kingdom Edimbourg 10 2010 P02.446 http:// hal. inria. fr/ inria-00529669/ en Poster A community standard format for the representation of protein affinity reagents. David Gloriam D. Sandra Orchard S. Daniela Bertinetti D. Erik Björling E. Erik Bongcam-Rudloff E. Carl Borrebaeck C. Julie Bourbeillon J. Andrew R M Bradbury A. R. M. Antoine De Daruvar A. Stefan Dübel S. Ronald Frank R. Toby J Gibson T. J. Larry Gold L. Niall Haslam N. Friedrich W Herberg F. W. Tara Hiltker T. Jörg D Hoheisel J. D. Samuel Kerrien S. Manfred Koegl M. Zoltán Konthur Z. Bernhard Korn B. Ulf Landegren U. Luisa Montecchi-Palazzi L. Sandrine Palcy S. Henry Rodriguez H. Sonja Schweinsberg S. Volker Sievert V. Oda Stoevesandt O. Michael J Taussig M. J. Marius Ueffing M. Mathias Uhlén M. Silvère Van Der Maarel S. Christer Wingren C. Peter Woollard P. David James Sherman D. J. Henning Hermjakob H. Mol Cell Proteomics 9 1 01 2010 1-10 http:// hal. inria. fr/ inria-00544751/ en DE The HUPO PSI's molecular interaction format–a community standard for the representation of protein interaction data Henning Hermjakob H. Luisa Montecchi-Palazzi L. G. Bader G. J. Wojcik J. L. Salwinski L. A. Ceol A. S. Moore S. Sandra Orchard S. U. Sarkans U. C. von Mering C. B. Roechert B. S. Poux S. E. Jung E. H. Mersch H. P. Kersey P. M. Lappe M. Y. Li Y. R. Zeng R. D. Rana D. Macha Nikolski M. H. Husi H. C. Brun C. K. Shanker K. SG. Grant S. C. Sander C. P. Bork P. W. Zhu W. A. Pandey A. A. Brazma A. B. Jacq B. M. Vidal M. David James Sherman D. J. P. Legrain P. G. Cesareni G. I. Xenarios I. D. Eisenberg D. B. Steipe B. C. Hogue C. R. Apweiler R. Nat. Biotechnol. 22 2 Feb. 2004 177-83 IntAct: an open source molecular interaction database Henning Hermjakob H. Luisa Montecchi-Palazzi L. C. Lewington C. S. Mudali S. Samuel Kerrien S. Sandra Orchard S. M. Vingron M. B. Roechert B. P. Roepstorff P. A. Valencia A. H. Margalit H. J. Armstrong J. A. Bairoch A. G. Cesareni G. David James Sherman D. J. R. Apweiler R. Nucleic Acids Res. 32 Jan. 2004 D452-5 Mining the semantics of genome super-blocks to infer ancestral architectures Géraldine Jean G. David James Sherman D. J. Macha Nikolski M. Journal of Computational Biology 2009 http:// hal. inria. fr/ inria-00414692/ en/ Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae M. Kellis M. B.W. Birren B. E.S. Lander E. Nature 428 2004 617-24 Sequencing and comparison of yeast species to identify genes and regulatory elements M. Kellis M. N. Patterson N. M. Endrizzi M. B.W. Birren B. E.S. Lander E. Nature 423 2003 241–254 Eucaryotic genome evolution through the spontaneous duplication of large chromosomal segments Romain Koszul R. S. Caburet S. Bernard Dujon B. Gilles Fischer G. EMBO Journal 23 1 2004 234-43 Genome-scale Metabolic Reconstruction of the Eukaryote Cell Factory Yarrowia Lipolytica Nicolás Loira N. Thierry Dulermo T. Macha Nikolski M. Jean-Marc Nicaud J.-M. David James Sherman D. J. 11th International Conference on Systems Biology (ICSB 2010) United Kingdom Edimbourg 10 2010 P02.602 http:// hal. inria. fr/ hal-00652922/ en Poster Reconstruction and Validation of the genome-scale metabolic model of Yarrowia lipolytica iNL705 Nicolás Loira N. David James Sherman D. J. Pascal Durrens P. Journée Ouvertes Biologie Informatique Mathématiques, JOBIM 2010 France Montpellier 09 2010 http:// www. jobim2010. fr/ ?q=fr/ node/ 55 Genetic improvement of thermo-tolerance in wine Saccharomyces cerevisiae strains by a backcross approach Philippe Marullo P. C. Mansour C. M. Dufour M. W. Albertin W. D. Sicard D. Marina Bely M. D. Dubourdieu D. FEMS Yeast Res 9 8 12 2009 1148–60 Single QTL mapping and nucleotide-level resolution of a physiologic trait in wine Sacchar omyces cerevisiae strains Philippe Marullo P. Gael Yvert G. Marina Bely M. Isabelle Masneuf-Pomarède I. Pascal Durrens P. Michel Aigle M. FEMS Yeast Res. 7 6 2007 941–52 Molecular typing of wine yeast strains Saccharomyces uvarumusing microsatellite ma rkers Isabelle Masneuf-Pomarède I. C. Lejeune C. Pascal Durrens P. M. Lollier M. Michel Aigle M. D. Dubourdieu D. Syst. Appl. Microbiol. 30 1 2007 75–82 Qualitative Transition Systems for the Abstraction and Comparison of Transient Behavior in Parametrized Dynamic Models Hayssam Soueidan H. Macha Nikolski M. Gregoire Sutre G. Computational Methods in Systems Biology (CMSB'09) Italie Bologna 5688 Springer Verlag 2009 313–327 http:// hal. archives-ouvertes. fr/ hal-00408909/ en/ The Generic Genome Browser: A building block for a model organism system database L. D. Stein L. D. Genome Res. 12 2002 1599-1610 Swarming Along the Evolutionary Branches Sheds Light on Genome Rearrangement Scenarios Nikolay Vyahhi N. Adrien Goëffon A. David James Sherman D. J. Macha Nikolski M. Franz Rothlauf F. ACM SIGEVO Conference on Genetic and evolutionary computation ACM ACM SIGEVO 2009 http:// hal. inria. fr/ inria-00407508/ en/ RU Characterization of an acquired-dps-containing gene island in the lactic acid bacterium Oenococcus oeni A. Athane A. Eric Bilhère E. Elisabeth Bon E. Patrick Lucas P. Guillaume Morel G. Aline Lonvaud-Funel A. Claire Le Hénaff-Le Marrec C. Journal of Applied Microbiology 2008 http:// hal. inria. fr/ inria-00340058/ en/ Received 22 October 2007, revised 8 April 2008 & Accepted 8 May 2008 (In press) New strategy for the representation and the integration of biomolecular knowledge at a cellular scale Roland Barriot R. Jerome Poix J. Alexis Groppi A. Aurelien Barre A. Nicolas Goffard N. David James Sherman D. J. Isabelle Dutour I. Antoine De Daruvar A. Nucleic Acids Research (NAR) 32 2004 3581-9 http:// hal. inria. fr/ inria-00202722/ en/ Insights into genome plasticity of the wine-making bacterium Oenococcus oeni strain ATCC BAA-1163 by decryption of its whole genome. Elisabeth Bon E. Cosette Granvalet C. Fabienne Remize F. Diliana Dimova D. Patrick Lucas P. Daniel Jacob D. Alexis Groppi A. Stéphanie Penaud S. Christophe Aulard C. Antoine De Daruvar A. Aline Lonvaud-Funel A. Jean Guzzo J. 9th Symposium on Lactic Acid Bacteria Egmond aan Zee Netherlands 2008 http:// hal. inria. fr/ inria-00340073/ en/ Exploratory Simulation of Cell Ageing Using Hierarchical Models Maria Cvijovic M. Hayssam Soueidan H. David James Sherman D. J. Edda Klipp E. Macha Nikolski M. J. Arthur J. S.-K. Ng S.-K. 19th International Conference on Genome Informatics Genome Informatics Gold Coast, Queensland Australia Genome Informatics 21 Imperial College Press, London 2008 114–125 http:// hal. inria. fr/ inria-00350616 EU FP6 Yeast Systems Biology Network LSHG-CT-2005-018942, EU Marie Curie Early Stage Training (EST) Network “Systems Biology”, ANR-05-BLAN-0331-03 (GENARISE) The whole genome of Oenococcus strain IOEB 8413 Diliana Dimova D. Elisabeth Bon E. Patrick Lucas P. R. Beugnot R. Marcel De Leeuw M. Aline Lonvaud-Funel A. 9th Symposium on Lactic Acid Bacteria Egmond aan Zee Netherlands 2008 http:// hal. inria. fr/ inria-00340086/ en/ Broadening the Horizon - Level 2.5 of the HUPO-PSI Format for Molecular Samuel Kerrien S. Sandra Orchard S. Luisa Montecchi-Palazzi L. B. Aranda B. A. Quinn A. N. Vinod N. G. Bader G. I. Xenarios I. J. Wojcik J. David James Sherman D. J. M. Tyers M. J. Salama J. S. Moore S. A. Ceol A. A. Chatr-Aryamontri A. M. Oesterheld M. V. Stumpflen V. L. Salwinski L. J. Nerothin J. E. Cerami E. M. Cusick M. M. Vidal M. M. Gilson M. J. Armstrong J. Peter Woollard P. C. Hogue C. D. Eisenberg D. G. Cesareni G. R. Apweiler R. Henning Hermjakob H. BMC Biology 5 10 2007 9;5(1):44 http:// hal. archives-ouvertes. fr/ hal-00306554/ en/ Génolevures: comparative genomics and molecular evolution of hemiascomycetous yeasts. David James Sherman D. J. Pascal Durrens P. Emmanuelle Beyne E. Macha Nikolski M. Jean-Luc Souciet J.-L. Nucleic Acids Research (NAR) 32 2004 D315-8 http:// hal. inria. fr/ inria-00407519/ en/ GDR CNRS 2354 “Génolevures” Genolevures complete genomes provide data and tools for comparative genomics of hemiascomycetous yeasts. David James Sherman D. J. Pascal Durrens P. Florian Iragne F. Emmanuelle Beyne E. Macha Nikolski M. Jean-Luc Souciet J.-L. Nucleic Acids Res 34 Database issue 01 2006 D432-5 http:// hal. archives-ouvertes. fr/ hal-00118142/ en/ Databases and Ontologies for Affinity Binders David James Sherman D. J. Natalia Golenetskaya N. 05 2010 http:// hal. inria. fr/ inria-00563531/ en/ Overview of advances in defining ontologies and building knowledge bases for affinity binders, over the four years of the ProteomeBinders project. Presented at the Affinomics/ProteomeBinders workshop at the Møller Center, Churchill College, Cambridge University. BioRica: A multi model description and simulation system Hayssam Soueidan H. David James Sherman D. J. Macha Nikolski M. F0SBE Allemagne 2007 279-287 http:// hal. archives-ouvertes. fr/ hal-00306550/ en/