Section: Application Domains

Life Sciences: Biology, Chemistry and Medicine

Participants : Adrien Coulet, Nicolas Jay, Joël Legrand, Jean Lieber, Pierre Monnin, Amedeo Napoli, Chedy Raïssi, Mohsen Sayed, Malika Smaïl-Tabbone, Yannick Toussaint, Mickaël Zehren.


knowledge discovery in life sciences, bioinformatics, biology, chemistry, medicine, pharmacogenomics

One major application domain which is currently investigated by the Orpailleur team is related to life sciences, with particular emphasis on biology, medicine, and chemistry. The understanding of biological systems provides complex problems for computer scientists, and the developed solutions bring new research ideas or possibilities for biologists and for computer scientists as well. Indeed, the interactions between researchers in biology and researchers in computer science improve not only knowledge about systems in biology, chemistry, and medicine, but knowledge about computer science as well.

Knowledge discovery is gaining more and more interest and importance in life sciences for mining either homogeneous databases such as protein sequences and structures, or heterogeneous databases for discovering interactions between genes and environment, or between genetic and phenotypic data, especially for public health and pharmacogenomics domains. The latter case appears to be one main challenge in knowledge discovery in biology and involves knowledge discovery from complex data depending on domain knowledge.

On the same line as biological data, chemical data are presenting important challenges w.r.t. knowledge discovery, for example for mining collections of molecular structures and collections of chemical reactions in organic chemistry. The mining of such collections is an important task for various reasons among which the challenge of graph mining and the industrial needs (especially in drug design, pharmacology and toxicology). Molecules and chemical reactions are complex data that can be modeled as undirected labeled graphs. Graph mining methods may play an important role in this framework and Formal Concept Analysis can also be used in an efficient and well-founded way [86]. Graph mining in the framework of FCA is a very important task on which we are working, whose results can be transferred to text mining as well.

We are working on knowledge management in medicine and analysis of patient trajectories. The Kasimir research project is about decision support and knowledge management for the treatment of cancer. This is a multidisciplinary research project in which researchers in computer science (Orpailleur) and experts in oncology are participating. For a given cancer localization, a treatment is based on a protocol, which is applied in 70% of the cases and provides a treatment. The 30% remaining cases are “out of the protocol”, e.g. contraindication, treatment impossibility, etc. and the protocol should be adapted, based on discussions among specialists. This adaptation process is modeled in Kasimir thanks to CBR, where semantic web technologies are used and adapted for several years.

The analysis of patient trajectories, i.e. the “path” of a patient during illness (chronic illnesses and cancer), can be considered as an analysis of sequences. It is important to understand such sequential data and sequence mining methods should be adapted for addressing the complex nature of medical events. We are interested in the analysis of trajectories at different levels of granularity and w.r.t. external domain ontologies. In addition, it is also important to be able to compare and classify trajectories according to their content. Then we are also interested in the definition of similarity measures able to take into account the complex nature of trajectories and that can be efficiently implemented for allowing quick and reliable classifications.

PractiKPharma (Practice-based evidences for actioning Knowledge in Pharmacogenomics) is a starting research project about the validation of state-of-the-art knowledge in pharmacogenomics by mining “Electronic Health Records” (EHRs) [55]. Pharmacogenomics is a field studying how genomic variations impact drug responses. Most of the state of the art in the field is only available in biomedical literature, with various levels of validation. Accordingly we propose firstly, to extract pharmacogenomic knowledge units from the literature and secondly, to confirm or moderate these units by mining EHRs. Comparing knowledge units extracted form the literature with facts extracted from EHRs is not a trivial task for several reasons, among which (i) the literature is in English, whereas EHRs are in French, (ii) EHRs represent observations at the patient level whereas the literature is generalizing sets of patients...