EN FR
EN FR


Section: Bilateral Contracts and Grants with Industry

The BioIntelligence Project

Participants : Mehwish Alam, Yasmine Assess, Aleksey Buzmakov, Adrien Coulet, Marie-Dominique Devignes, Amedeo Napoli [contact person] , Malika Smaïl-Tabbone.

The objective of the “BioIntelligence” project is to design an integrated framework for the discovery and the development of new biological products. This framework takes into account all phases of the development of a product, from molecular to industrial aspects, and is intended to be used in life science industry (pharmacy, medicine, cosmetics, etc.). The framework has to propose various tools and activities such as: (1) a platform for searching and analyzing biological information (heterogeneous data, documents, knowledge sources, etc.), (2) knowledge-based models and process for simulation and biology in silico, (3) the management of all activities related to the discovery of new products in collaboration with the industrial laboratories (collaborative work, industrial process management, quality, certification). The “BioIntelligence” project is led by “Dassault Systèmes” and involves industrial partners such as Sanofi Aventis, Laboratoires Pierre Fabre, Ipsen, Servier, Bayer Crops, and two academics, Inserm and Inria. An annual meeting of the project usually takes place in Sophia-Antipolis at the beginning of July.

Two theses related to “BioIntelligence” are currently running in the Orpailleur team. A first thesis is related to the study of possible combination of mining methods on biological data. The mining methods which are considered here are based on FCA and RCA, itemset and association rule extraction, and inductive logic programming. These methods have their own strengths and provide different special capabilities for extending domain ontologies. A particular attention will be paid to the integration of heterogeneous biological data and the management of a large volume of biological data while being guided by domain knowledge lying in ontologies (linking data and knowledge units). Practical experiments will be led on biological data (clinical trials data and cohort data) also in accordance with ontologies lying at the NCBO BioPortal.

A second thesis is based on an extension of FCA involving Pattern Structures on Graphs. The idea is to be able to extend the formalism of pattern structures to graphs and to apply the resulting framework on molecular structures. In this way, it will be possible to classify molecular structures and reactions by their content. This will help practitioners in information retrieval tasks involving molecular structures or the search for particular reactions. In addition, an experiment was also carried out in the combination of supervised (distance-based clustering) and unsupervised learning (FCA) methods for the prediction of the configuration of inhibitors of the c-Met protein (which is very active in cancer).

In addition, a forthcoming thesis will be in concern with ontology re-engineering in the domain of biology. The objective is consider the content of the BioPortal ontologies (http://bioportal.bioontology.org/ ) and to design formal contexts and associated concept lattices which will become supports for ontological schemes. Moreover, this ontological schema will be completed thanks to external resources such as Wikipedia and domain knowledge as well. The global idea is to get definitions and thus classification capabilities for atomic or primitive concepts.