EN FR
EN FR


Section: Partnerships and Cooperations

National Initiatives

ANR

ANR Hybride

Participants : Luis Felipe Melo, Amedeo Napoli, Chedy Raïssi, My Thao Tang, Yannick Toussaint [contact person] .

The Hybride research project aims at developing new methods and tools for supporting knowledge discovery from textual data by combining methods from Natural Language Processing (NLP) and Knowledge Discovery in Databases (KDD). A key idea is to design an interacting and convergent process where NLP methods are used for guiding text mining and KDD methods are used for analyzing textual documents. NLP methods are mainly based on text analysis, and extraction of general and temporal information, while KDD methods are based on pattern mining, e.g. itemsets and sequences, formal concept analysis and variations, and graph mining. In this way, NLP methods applied to some texts locate “textual information” that can be used by KDD methods as constraints for focusing the mining of textual data. By contrast, KDD methods can extract itemsets or sequences that can be used for guiding information extraction from texts and text analysis. This combination of NLP and KDD methods for common objectives, can be viewed as a continuous process, based on a sequence of complex operations from NLP and KDD that reinforces itself through a feedback loop. Experimental and validation parts associated with the Hybride project are provided by an application to the documentation of rare diseases in the context of Orphanet.

The fundamental aspects of the Hybride project can be understood through the main steps of the knowledge discovery loop with a NLP/KDD perspective : (i) data preparation, (ii) data mining, (iii) interpretation and validation of the results, (iv) knowledge construction. At each step, new methods have to be designed for achieving this interrelated NLP/KDD loop. One of the outcomes of the project should be a system integrating the operations involved within the whole NLP/KDD loop, in the context of Orphanet for text analysis and production of new documentation on rare diseases. The implementation of such a system combines various interrelated aspects, namely natural language processing, knowledge discovery, data mining, and knowledge engineering. This original combination still remains a challenge in computer science.

The partners of the Hybride consortium are the GREYC Caen laboratory (pattern mining, NLP, text mining), the MoDyCo Paris laboratory (NLP, linguistics), the INSERM Paris laboratory (Orphanet, ontology design), and Inria NGE (FCA, knowledge representation, pattern mining, text mining).

ANR Kolflow

Participants : Jean Lieber [contact person] , Amedeo Napoli, Emmanuel Nauer, Julien Stévenot, My Thao Tang, Yannick Toussaint.

Kolflow (http://kolflow.univ-nantes.fr/ ) is a 3-years basic research project taking place from February 2011 to July 2014, funded by French National Agency for Research (ANR), program ANR CONTINT. The aim of the project is investigation on man-machine collaboration in continuous knowledge-construction flows. Kolflow partners are GDD (LINA Nantes), Silex (LIRIS Lyon), Orpailleur, Score (LORIA), and Wimmics (Inria Sophia Antipolis).

ANR PEPSI: Polynomial Expansions of Protein Structures and Interactions

Participants : Dave Ritchie, Marie-Dominique Devignes, Malika Smaïl-Tabbone.

The PEPSI (“Polynomial Expansions of Protein Structures and Interactions”) project is a collaboration with Sergei Grudinin at Inria Grenoble (project Nano-D) and Valentin Gordeliy at the Institut de Biologie Structurale (IBS) in Grenoble. This four-year project funded by the ANR Modèles Numériques programme involves developing computational protein modeling and docking techniques and using them to help solve the structures of large molecular systems experimentally (http://pepsi.gforge.inria.fr ).

ANR Trajcan: a study of patient care trajectories

Participants : Elias Egho, Nicolas Jay [contact person] , Amedeo Napoli, Chedy Raïssi.

Since 30 years, many patient classification systems (PCS) have been developed. These systems aim at classifying care episodes into groups according to different patient characteristics. In France, the so-called “Programme de Médicalisation des Systèmes d'Information” (PMSI) is a national wide PCS in use in every hospital. It systematically collects data about millions of hospitalizations. Though it is used for funding purposes, it includes useful knowledge for other public health domains such as epidemiology or health care planning.

The objective of the Trajcan project is to represent and analyze “patient care trajectories” (patient suffering from cancer limited to breast, colon, rectum, and lung cancers) and the associated healthcares. The data are related to patients receiving hospital cares in the “Bourgogne” region and using data from the PMSI. Such an analysis involves various data, e.g. type of cancer, number of visits, type of stays, hospitalization services and therapies used, and demographic factors, i.e. age, gender, place of residence.

One thesis is currently carried out on this subject whose objective is to design a knowledge discovery system working on multidimensional and sequential data for characterizing Patient Care Trajectories (PCT). This thesis combines knowledge discovery and knowledge representation methods for improving the definition of patient care trajectories as temporal objects (sequential data mining). The overall objective id to provide in decision support for improving healthcare in detecting for example typical or exceptional trajectories for planning with precision healthcare for a given population. In order to discover groups of patients showing similar health condition, treatments or journeys through the healthcare system, PCT are modeled as multilevel and multidimensional sequences of itemsets, using external knowledge on hospitals, medical procedures and diagnoses. Accordingly, a new algorithm [42] has been developed to mine sequential patterns.

Other National Inititives and Collaborations

PEPS Cryo-CA

Participant : Dave Ritchie [Inria Nancy] .

Cryo-CA is a two-year PEPS project (Projets exploratoires pluridisciplinaires) funded by CNRS, involving a collaboration with cryo-electron microscopy experimentalists at the IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire) in Strasbourg. People involved in the project with Dave Ritchie are Sergei Grudinin (Inria Grenoble), Annick Dejaegere (IGBMC, Strasbourg), and Patrick Schultz (IGBMC Strasbourg). The aim of the project is to encourage collaborations between experimentalists and computer scientists in order to advance the state of the art of computational algorithms in structural biology. In November 2012, a workshop funded by this project attracted some 60 participants (http://ccsb2012.loria.fr ).

Towards the discovery of new nonribosomal peptides and synthetases

We have initiated a collaboration with researchers from the LIFL and Université Lille Nord de France. We collaborate on the NRPS toolbox [57] . Data was cleaned and integrated from various public and specific analysis programs. The resulting database should facilitate the process of knowledge discovery of new nonribosomal peptides and synthetases.