EN FR
EN FR


Section: Partnerships and Cooperations

National Initiatives

ANR

HEREDIA

Participant : Jean-Sébastien Sereni [contact person] .

HEREDIA (http://www.liafa.univ-paris-diderot.fr/~sereni/Heredia/ ) is an ANR JCJC (“Jeunes Chercheurs”) focusing on hereditary properties of graphs, which provide a general perspective to study graph properties. Several important general theorems are known and the approach offers an elegant way of unifying notions and proof techniques. Further, hereditary classes of graphs play a central role in graph theory. Besides their theoretical appeal, they are also particularly relevant from an algorithmic point of view. With Jean-Sébastien Sereni, the HEREDIA project involves Pierre Charbit (LIAFA, Paris), Louis Esperet (G-SCOP, Grenoble) and Nicolas Trotignon (LIP, Lyon).

Hybride

Participants : Luis-Felipe Melo, Amedeo Napoli, Chedy Raïssi, My Thao Tang, Mohsen Sayed, Yannick Toussaint [contact person] .

The Hybride research project (http://hybride.loria.fr/ ) aims at developing new methods and tools for supporting knowledge discovery from textual data by combining methods from Natural Language Processing (NLP) and Knowledge Discovery in Databases (KDD). A key idea is to design an interacting and convergent process where NLP methods are used for guiding text mining and KDD methods are used for analyzing textual documents. NLP methods are mainly based on text analysis, and extraction of general and temporal information. KDD methods are based on pattern mining, e.g. itemsets and sequences, formal concept analysis and variations, and graph mining. In this way, NLP methods applied to some texts locate “textual information” that can be used by KDD methods as constraints for focusing the mining of textual data. By contrast, KDD methods can extract itemsets or sequences that can be used for guiding information extraction from texts and text analysis. Experimental and validation parts associated with the Hybride project are provided by an application to the documentation of rare diseases in the context of Orphanet.

The partners of the Hybride consortium are the GREYC Caen laboratory (pattern mining, NLP, text mining), the MoDyCo Paris laboratory (NLP, linguistics), the INSERM Paris laboratory (Orphanet, ontology design), and the Orpailleur team at Inria NGE (FCA, knowledge representation, pattern mining, text mining).

ISTEX

Participants : Luis-Felipe Melo, Amedeo Napoli, Yannick Toussaint [contact person] .

ISTEX is a so-called “Initiative d'excellence” managed by CNRS and DIST (“Direction de l'Information Scientifique et Technique”). ISTEX aims at giving to the research and teaching community an on-line access to scientific publications in all the domains. Thus ISTEX is in concern with a massive acquisition of documentation such as journals, proceedings, corpus, databases...ISTEX-R is one research project within ISTEX in which is involved the Orpailleur team, with two other partners, namely the ATILF laboratory and the INIST Institute (both in Nancy). ISTEX-R aims at developing a new generation of tools for querying full-text documentation, analyzing their content or extracting information and knowledge units. A platform is currently under development to provide robust NLP tools for text processing, as well as methods in text mining and domain conceptualization.

Kolflow

Participants : Jean Lieber [contact person] , Alice Hermann, Amedeo Napoli, Emmanuel Nauer, My Thao Tang, Yannick Toussaint.

Kolflow (http://kolflow.univ-nantes.fr/ ) is a 3-year basic research project taking place from February 2011 to July 2014, funded by French National Agency for Research (ANR), program ANR CONTINT. The aim of the project is investigation on man-machine collaboration in continuous knowledge-construction flows.

Kolflow partners are GDD (LINA Nantes), Silex (LIRIS Lyon), Orpailleur (Inria NGE/LORIA), Score (Inria NGE/LORIA), and Wimmics (Inria Sophia Antipolis).

PEPSI: Polynomial Expansions of Protein Structures and Interactions

Participants : David Ritchie, Marie-Dominique Devignes, Malika Smaïl-Tabbone.

The PEPSI (“Polynomial Expansions of Protein Structures and Interactions”) project is a collaboration with Sergei Grudinin at Inria Grenoble (project Nano-D) and Valentin Gordeliy at the Institut de Biologie Structurale (IBS) in Grenoble. This four-year project funded by the ANR “Modèles Numériques” program involves developing computational protein modeling and docking techniques and using them to help solve the structures of large molecular systems experimentally (http://pepsi.gforge.inria.fr ).

Termith

Participants : Luis-Felipe Melo, Yannick Toussaint [contact person] .

Termith (http://www.atilf.fr/ressources/termith/ ) is an ANR Project which involves the following laboratories: ATILF, LIDILEM, LINA, INIST, Inria Saclay and Inria Nancy Grand Est. It aims at indexing documents belonging to different domain of Humanities. Thus, the project focuses on extracting term candidates (information extraction) and on disambiguation.

In the Orpailleur team, we are mainly concerned by information extraction using Formal Concept Analysis techniques, but also itemset or sequence extraction. The objective is to define “contexts introducing terms”, i.e. finding textual environments allowing a system to decide whether a textual element is actually a term and its corresponding domain.

Trajcan: a study of patient care trajectories

Participants : Elias Egho, Nicolas Jay [contact person] , Amedeo Napoli, Chedy Raïssi.

Since 30 years, many patient classification systems (PCS) have been developed. These systems aim at classifying care episodes into groups according to different patient characteristics. In France, the so-called “Programme de Médicalisation des Systèmes d'Information” (PMSI) is a national wide PCS in use in every hospital. It systematically collects data about millions of hospitalizations. Though it is used for funding purposes, it includes useful information for public health domains such as epidemiology or health care planning.

The objective of the Trajcan project is to represent and analyze “patient care trajectories” (patient suffering from cancer limited to breast, colon, rectum, and lung cancers) and the associated healthcares. The data are related to patients receiving hospital cares in the “Bourgogne” region and using data from the PMSI. Such an analysis involves various data, e.g. type of cancer, number of visits, type of stays, hospitalization services and therapies used, and demographic factors, i.e. age, gender, place of residence.

One thesis is currently carried out on this subject whose objective is to design a knowledge discovery system working on multidimensional and sequential data for characterizing Patient Care Trajectories (PCT) [52] , [62] . This thesis combines knowledge discovery and knowledge representation methods for improving the definition of patient care trajectories as temporal objects (sequential data mining). The overall objective is to improve decision support and healthcare in detecting for example typical or exceptional trajectories for planning with precision healthcare for a given population.

In parallel, Formal Concept Analysis techniques were used in conjunction with regression tree analysis to produce semi-automated classification of PCTs in the field of breast cancer in France [27] .

Other National Initiatives and Collaborations

PEPS Cryo-CA

Participant : David Ritchie [Inria Nancy] .

Cryo-CA is a two-year PEPS project (“Projets exploratoires pluridisciplinaires”) funded by CNRS, involving a collaboration with cryo-electron microscopy experimentalists at the IGBMC (“Institut de Génétique et de Biologie Moléculaire et Cellulaire”) in Strasbourg. People involved in the project with David Ritchie are Sergei Grudinin (Inria Grenoble), Annick Dejaegere (IGBMC, Strasbourg), and Patrick Schultz (IGBMC Strasbourg). The aim of the project is to encourage collaborations between experimentalists and computer scientists in order to advance the state of the art of computational algorithms in structural biology.

Towards the discovery of new nonribosomal peptides and synthetases

We have initiated a collaboration with researchers from the LIFL and Université Lille Nord de France. We collaborated on the NRPS toolbox  [109] . Data was cleaned and integrated from various public and specific analysis programs. The resulting database should facilitate the process of knowledge discovery of new nonribosomal peptides and synthetases.