EN FR
EN FR


Section: Partnerships and Cooperations

National Initiatives

ANR

HEREDIA

Participant : Jean-Sébastien Sereni [contact person] .

HEREDIA (http://www.liafa.univ-paris-diderot.fr/~sereni/Heredia/ ) is an ANR JCJC (“Jeunes Chercheurs”) focusing on hereditary properties of graphs, which provide a general perspective to study graph properties. Several important general theorems are known and the approach offers an elegant way of unifying notions and proof techniques. Further, hereditary classes of graphs play a central role in graph theory. Besides their theoretical appeal, they are also particularly relevant from an algorithmic point of view. With Jean-Sébastien Sereni, the HEREDIA project involves Pierre Charbit (LIAFA, Paris), Louis Esperet (G-SCOP, Grenoble) and Nicolas Trotignon (LIP, Lyon).

Hybride

Participants : Adrien Coulet, Luis-Felipe Melo, Amedeo Napoli, Matthieu Osmuk, Chedy Raïssi, My Thao Tang, Mohsen Sayed, Yannick Toussaint [contact person] .

The Hybride research project (http://hybride.loria.fr/ ) aims at combining Natural Language Processing (NLP) and Knowledge Discovery in Databases (KDD) for text mining. A key idea is to design an interacting and convergent process where NLP methods are used for guiding text mining and KDD methods are used for guiding the analysis of textual documents. NLP methods are mainly based on text analysis and extraction of general and temporal information. KDD methods are based on pattern mining, e.g. patterns and sequences, formal concept analysis and graph mining. In this way, NLP methods applied to texts extract “textual information” that can be used by KDD methods as constraints for focusing the mining of textual data. By contrast, KDD methods extract patterns and sequences to be used for guiding information extraction from texts and text analysis. Experimental and validation parts associated with the Hybride project are provided by an application to the documentation of rare diseases in the context of Orphanet.

The partners of the Hybride consortium are the GREYC Caen laboratory (pattern mining, NLP, text mining), the MoDyCo Paris laboratory (NLP, linguistics), the INSERM Paris laboratory (Orphanet, ontology design), and the Orpailleur team at Inria NGE (FCA, knowledge representation, pattern mining, text mining).

ISTEX

Participants : Luis-Felipe Melo, Amedeo Napoli, Yannick Toussaint [contact person] .

ISTEX is a so-called “Initiative d'excellence” managed by CNRS and DIST (“Direction de l'Information Scientifique et Technique”). ISTEX aims at giving to the research and teaching community an on-line access to scientific publications in all the domains. Thus ISTEX is in concern with a massive acquisition of documentation such as journals, proceedings, corpus, databases...ISTEX-R is one research project within ISTEX in which is involved the Orpailleur team, with two other partners, namely the ATILF laboratory and the INIST Institute (both in Nancy). ISTEX-R aims at developing new tools for querying full-text documentation, analyzing content and extracting information. A platform is currently under development to provide robust NLP tools for text processing, as well as methods in text mining and domain conceptualization.

Kolflow

Participants : Jean Lieber [contact person] , Alice Hermann, Amedeo Napoli, Emmanuel Nauer, My Thao Tang, Yannick Toussaint.

Kolflow (http://kolflow.univ-nantes.fr/ ) is a 3-year basic research project taking place from February 2011 to November 2014, funded by French National Agency for Research (ANR), program ANR CONTINT. The aim of the project is to investigate man-machine collaboration in continuous knowledge-construction flows.

Kolflow partners are GDD (LINA Nantes), Silex (LIRIS Lyon), Orpailleur (Inria NGE/LORIA), Coast (Inria NGE/LORIA), and Wimmics (Inria Sophia Antipolis).

PEPSI: Polynomial Expansions of Protein Structures and Interactions

Participants : David Ritchie [contact person] , Marie-Dominique Devignes, Malika Smaïl-Tabbone, Seyed Ziaeddin Alborzi.

The PEPSI (“Polynomial Expansions of Protein Structures and Interactions”) project is a collaboration with Sergei Grudinin at Inria Grenoble (project Nano-D) and Valentin Gordeliy at the Institut de Biologie Structurale (IBS) in Grenoble. This four-year project funded by the ANR “Modèles Numériques” program involves developing computational protein modeling and docking techniques and using them to help solve the structures of large molecular systems experimentally (http://pepsi.gforge.inria.fr ).

Termith

Participants : Luis-Felipe Melo, Yannick Toussaint [contact person] .

Termith (http://www.atilf.fr/ressources/termith/ ) is an ANR Project which involves the following laboratories: ATILF, LIDILEM, LINA, INIST, Inria Saclay and Inria Nancy Grand Est. It aims at indexing documents belonging to different domain of Humanities. Thus, the project focuses on extracting candidate terms (information extraction) and on disambiguation.

In the Orpailleur team, we are mainly concerned by information extraction using Formal Concept Analysis techniques, but also pattern and sequence mining. The objective is to define “contexts introducing terms”, i.e. finding textual environments allowing a system to decide whether a textual element is actually a candidate term and its corresponding environment.

Trajcan: a study of patient care trajectories

Participants : Elias Egho, Nicolas Jay [contact person] , Amedeo Napoli, Chedy Raïssi.

Since 30 years, many patient classification systems (PCS) have been developed. These systems aim at classifying care episodes into groups according to different patient characteristics. In France, the so-called “Programme de Médicalisation des Systèmes d'Information” (PMSI) is a national wide PCS in use in every hospital. It systematically collects data about millions of hospitalizations. Though it is used for funding purposes, it includes useful information for public health domains such as epidemiology or health care planning.

The objective of the Trajcan project was to represent and analyze “patient care trajectories” (patient suffering from cancer limited to breast, colon, rectum, and lung cancers) and the associated healthcares (it should be noticed that the Trajcan Project ended at the beginning of 2014). The data are related to patients receiving hospital cares in the “Bourgogne” region and using data from PMSI. Such an analysis involves various data, e.g. type of cancer, number of visits, type of stays, hospitalization services, therapies used, and demographic factors such as age, gender, place of residence.

Elias Egho defended a Phd thesis on this subject in July 2014 [15] . Combining knowledge discovery and knowledge representation methods for improving the definition of patients as temporal objects (sequential data mining), he successfully developed different approaches for characterizing Patient Care Trajectories (PCT). A first characterization is based on sequential pattern structures, extending Formal Concept Analysis techniques to multidimensional sequential data. A second one, involves an algorithm called MMISP for “Mining Multidimensional Itemsets Sequential Patterns” and makes use of external knowledge to improve the mining process and discover sequential patterns at different levels of granularity [62] . Finally, a new similarity measure was developed for comparing sequences of itemsets and for applying clustering methods to classify patients having similar healthcare trajectories. This later work was distinguished by a forthcoming publication in Data Mining and Knowledge Discovery.

Other National Initiatives and Collaborations

Towards the discovery of new nonribosomal peptides and synthetases

We have initiated a collaboration with researchers from the LIFL and Université Lille Nord de France on the NRPS toolbox. Data was cleaned and integrated from various public and specific analysis programs. The resulting database should facilitate the process of knowledge discovery of new nonribosomal peptides and synthetases. Actual results of this research collaboration were published in [21] .

FUI Poqemon

Participant : Chedy Raïssi [Contact Person] .

The POQEMON project aims at developing new pattern mining methods and tools for guiding knowledge discovery from mobile phone networks for monitoring purposes. The main idea is to develop sound approaches that handle the trade-off between privacy of data and the power of analysis.