EN FR
EN FR


Section: Partnerships and Cooperations

International projects and collaborations

Fapemig INRIA Project: Incorporating knowledge models into scalable data mining algorithms

Participants : Mehdi Kaytoue, Amedeo Napoli [contact person] , Chedy Raïssi.

This Fapemig – INRIA research project involves researchers at Universidade Federal de Minas Gerais in Belo Horizonte –a group led by Prof. Wagner Meira– and the Orpailleur team at INRIA Nancy Grand Est. In this project we are interested in the mining of large amount of data and we target two relevant application scenarios where such issue may be observed. The first one is text mining, i.e. extracting knowledge from texts and document categorization. The second application scenario is graph mining, i.e. determining relationship-based patterns and use these relations to perform classification tasks. In both cases, the computational complexity is large either because the high dimensionality of the data or the complexity of the patterns to be mined.

One strategy to ease the execution of such data mining tasks is to use existing knowledge to restrict the search space and to assess the quality of the patterns found. This existing knowledge may be formalized in ontologies but also in other ways whose study is a research issue in this project. Once we are able to build knowledge models, we need to determine how to use such knowledge models, which is a second major research issue in this project. In particular, we want to design and evaluate mechanisms that allow the exploitation of existing knowledge for sake of improving data mining algorithms.

Finally, the computational complexity of the algorithms remains a major issue and we intend to address it through parallel algorithms. Data mining algorithms, in general, represent a challenge for sake of parallelization because they are irregular and intensive in terms of both computing and communication. Accordingly, in a first joint work, we developed a new parallel algorithm to build skycubes based on the Anthill framework developed at UFMG. The paper was presented in a local Brazilian Conference and an extended journal version will appear in a 2012 special issue of the International Journal of Parallel Programming.

Search for anti-HIV drugs acting as entry-blockers

Participants : Thomas Bourquard, Marie-Dominique Devignes, Anisah Ghoorah, Lazaros Mavridis, Violeta Pérez-Nueno, Dave Ritchie, Malika Smaïl-Tabbone, Vishwesh Venkatraman.

In collaboration with computational chemistry colleagues at the University of Bari and the Institut Chimique de Saria (IQS) in Barcelona, Dave Ritchie has published reviews of the state of in silico protein structure modeling and virtual drug screening techniques for the CCR5 [87] , and CXCR4 [111] , entry-blocking molecules. As there now exist several hundred such entry-blockers, there is considerable interest in the chemoinformatics community in how best to use knowledge of known drug molecules to develop new and more potent new drug candidates [112] . The spherical harmonic clustering approach developed by Dave Ritchie and Violeta Pérez-Nueno was recently used successfully in a virtual screening study at the IQS to discover new high-affinity ligands for CXCR4 [109] .

International collaborations in Mining complex data

Participants : Isiru Bayissa, Adrien Coulet, Mehdi Kaytoue, Amedeo Napoli, Chedy Raïssi.

A first collaboration involves “Université du Québec à Montréal” (UQAM) in Montréal with Prof. Petko Valtchev and Laboratoire LIRMM in Montpellier with Prof. Marianne Huchard. This collaboration is supported by a CNRS PICS project (2011-2014), which is called “Concept Analysis driving Ontology Engineering” and abbreviated in “CAdOE”. The research work within this project is aimed at defining and implementing a semi-automatic methodology supporting ontology engineering based on the joint use of Formal Concept Analysis (FCA) and Relational Concept Analysis (RCA). At the moment, some elements of this methodology are existing and were used in text mining [85] , [84] . However, the first methodology should be completed and improved, especially regarding the applicability on complex data and the interoperability with knowledge representation modules. This year, some publications were already obtained and some others are in preparation for next year [36] , [56] , [75] .

A second collaboration involves Sergei Kusnetsov at Higher School of Economics in Moscow (HSE). Mehdi Kaytoue and Amedeo Napoli visited HSE laboratory in July 2010 granted by the Poncelet Laboratory in Moscow, a joint CNRS – INRIA laboratory. This visit was the occasion of preparing a number of publications, among which a publication in a first-rank conference in Artificial Intelligence (major [5] ), together with some other important publications [49] , [33] , [48] . This shows that the collaboration is on-going and that there is still a substantial research work to be done. This year, Amedeo Napoli visited HSE laboratory in June 2011 while Sergei Kuznetsov visited Loria in October 2011.

A third collaboration –a PHC Zenon project– exists with Florent Domenach, associated professor at the University of Nicosia in Cyprus. This project is entitled “Knowledge Discovery for Complex Data in Formal and Relational Concept Analysis” (KD4CD) and is aimed at studying and combining different types of classification process in the framework of FCA. These processes can be based on Galois connections but also on the so-called “overhangings”, i.e. a kind of generalization of closure systems. Moreover, another interest is put on consensus theory where the objective is to find the better classification of a set of abjects according to a quality measure (this could be applied to ontologies). This year, there were two visits from France to Cyprus in May and December 2011 while there was one visit from Cyprus to France in October 2011.