Scientific Domain Knowledge Improves Exoplanet Transit Classification with Deep Learning

ORPAILLEUR Knowledge discovery, knowledge engineering

Data and Knowledge Representation and Processing

Perception, Cognition and Interaction

https://orpailleur.loria.fr/ Laboratoire lorrain de recherche en informatique et ses applications (LORIA) CNRS Université de Lorraine Creation of the Project-Team: 2008 January 01 Project-Team A3. - Data and knowledge A3.1. - Data A3.1.1. - Modeling, representation A3.1.7. - Open data A3.2. - Knowledge A3.2.1. - Knowledge bases A3.2.2. - Knowledge extraction, cleaning A3.2.3. - Inference A3.2.4. - Semantic Web A3.2.5. - Ontologies A3.2.6. - Linked data A3.3. - Data and knowledge analysis A3.3.2. - Data mining A3.4.1. - Supervised learning A3.4.2. - Unsupervised learning A3.4.5. - Bayesian methods A3.4.8. - Deep learning A3.5.2. - Recommendation systems A4.8. - Privacy-enhancing technologies A8.1. - Discrete mathematics, combinatorics A9.1. - Knowledge A9.2. - Machine learning A9.6. - Decision support A9.8. - Reasoning B1.1. - Biology B2. - Health B2.3. - Epidemiology B2.4.1. - Pharmaco kinetics and dynamics B2.4.2. - Drug resistance B3.1. - Sustainable development B3.5. - Agronomy B3.6. - Ecology B3.6.1. - Biodiversity B5. - Industry of the future B9.5.6. - Data science Amedeo Napoli Chercheur

Nancy

Team leader (until Oct 1st), CNRS, Senior Researcher oui Esther Catherine Galbrun Chercheur

Nancy

Inria, Researcher, on sabbatical leave since 2018 Chedy Raïssi Chercheur

Nancy

Inria, Researcher, on sabbatical leave since 2019 Miguel Couceiro Enseignant

Nancy

Team leader (since Oct 1st), Univ. de Lorraine, Professor oui Adrien Coulet Enseignant

Nancy

Univ. de Lorraine, Associate Professor, sabbatical leave at University of Stanford, USA, until September 1st oui Sébastien Da Silva Enseignant

Nancy

Univ. de Lorraine, Associate Professor Jean-François Mari Enseignant

Nancy

Univ. de Lorraine, Professor oui Yannick Toussaint Enseignant

Nancy

Univ. de Lorraine, Professor oui Alexandre Bazin PostDoc

Nancy

Univ de Lorraine Abdelkader Ouali PostDoc

Nancy

Inria (until August 31) Nacira Abbas PhD

Nancy

Inria Guilherme Alves Da Silva PhD

Nancy

Inria Quentin Brabant PhD

Nancy

Univ. de Lorraine, until Feb 1st Laurine Huber PhD

Nancy

Univ. de Lorraine Nyoman Juniarta PhD

Nancy

CNRS Tatiana Makhalova PhD

Nancy

Inria Pierre Monnin PhD

Nancy

Univ. de Lorraine François Pirot PhD

Nancy

Univ. de Lorraine Justine Reynaud PhD

Nancy

Univ. de Lorraine, ATER Claire Theobald PhD

Nancy

CNRS, from Dec 2019 Laura Alejandra Zanella Calzada PhD

Nancy

Univ de Lorraine, from Nov 2019 Georgios Zervakis PhD

Nancy

Inria, from Nov 2019 Nicolas Dante Technique

Nancy

Univ de Lorraine, Engineer, from June 2019 until August 2019 Laureline Nevin Technique

Nancy

Inria, Engineer Clement Bellanger Stagiaire

Nancy

Univ de Lorraine, from Feb 2019 until Jun 2019 Morgane Colle Stagiaire

Nancy

Univ de Lorraine, from Feb 2019 until Aug 2019 Romain Dalbard Stagiaire

Nancy

Univ de Lorraine, from Jun 2019 until Aug 2019 Victor Freyer Stagiaire

Nancy

Univ de Lorraine, from Apr 2019 until Sep 2019 Bofeng Huang Stagiaire

Nancy

Univ de Lorraine, from Apr 2019 until Sep 2019 Murat Kocak Stagiaire

Nancy

Univ de Lorraine, from Jul 2019 until Oct 2019 Melvin Moreau Stagiaire

Nancy

Univ de Lorraine, from Jun 2019 until Aug 2019 Maryam Naderan Stagiaire

Nancy

Univ de Lorraine, from Mar until July 2019 Gaurav Shajepal Stagiaire

Nancy

Univ de Lorraine, from Mar 2019 until Sep 2019 Yoann Simon Stagiaire

Nancy

Univ de Lorraine, from Apr 2019 until May 2019 Mayssaa Zeaiter Stagiaire

Nancy

Inria, from Apr 2019 until Sep 2019 Emmanuelle Deschamps Assistant

Nancy

Inria, Administrative Assistant Delphine Hubert Assistant

Nancy

Univ. de Lorraine, Administrative Assistant Annick Jacquot Assistant

Nancy

CNRS, Administrative Assistant Martine Kuhlmann Assistant

Nancy

CNRS, Administrative Assistant (until Apr 2019) Messaoudi Anne-Marie Assistant

Nancy

Univ de Lorraine, Administrative Assistant (since Apr 2019) Alexandre Blansché CollaborateurExterieur

Nancy

Univ. de Lorraine, Metz, Associate Professor Lydia Boudjeloud-Assala CollaborateurExterieur

Nancy

Univ. de Lorraine, Metz, Associate Professor oui Brieuc Conan-Guez CollaborateurExterieur

Nancy

Univ. de Lorraine, Metz, Associate Professor Alain Gély CollaborateurExterieur

Nancy

Univ. de Lorraine, Metz, Associate Professor Florence Le Ber CollaborateurExterieur

Nancy

ENGEES Strasbourg, Professor oui Frédéric Pennerath CollaborateurExterieur

Nancy

Centrale-Supelec Metz, Associate Professor Overall Objectives Introduction

Knowledge discovery in databases (KDD) consists in processing large volumes of data in order to discover knowledge units that are significant and reusable. Assimilating knowledge units to gold nuggets, and databases to lands or rivers to be explored, the KDD process can be likened to the process of searching for gold. This explains the name of the research team: in French “orpailleur” denotes a person who is searching for gold in rivers or mountains. The KDD process is based on three main operations: data preparation, data mining and interpretation of the extracted units as knowledge units. Moreover, the KDD process is iterative, interactive, and generally controlled by an expert of the data domain, called the analyst. The analyst selects and interprets a subset of the extracted units for obtaining knowledge units having a certain plausibility. In this view, KDD is an exploratory process similar to “exploratory data analysis”.

As a person searching for gold may have a certain experience about the task and the location, the analyst may use general and domain knowledge for improving the whole KDD process. Accordingly, the KDD process may be associated with knowledge bases –or domain ontologies– related to the domain of data for implementing knowledge discovery guided by domain knowledge (KDDK). In KDDK, extracted units may have “a life” after the interpretation step for becoming “actionable”: they are represented as knowledge units using a knowledge representation formalism and integrated within an ontology to be reused for problem-solving needs. In this way, knowledge discovery extends and updates existing knowledge bases, materializing a complementarity between knowledge discovery and knowledge engineering.

Research Program Hybrid and Exploratory Knowledge Discovery

Keywords: knowledge discovery in databases, knowledge discovery in databases guided by domain knowledge, data mining, data exploration, formal concept analysis, classification, pattern mining, numerical methods in data mining.

Knowledge discovery in databases (KDD) aims at discovering intelligible and reusable patterns in possibly large databases. These patterns can then be interpreted as knowledge units to be reused in knowledge-based systems. From an operational point of view, the KDD process is based on three main steps: (i) selection and preparation of the data, (ii) data mining, (iii) interpretation of the discovered patterns. Moreover, the KDD process is iterative, interactive, and generally controlled by an expert of the data domain, called the analyst. The analyst selects and interprets a subset of the extracted units for obtaining knowledge units having a certain plausibility. In this view, KDD is an exploratory process similar to “exploratory data analysis”.

The KDD process –as implemented in the Orpailleur team– is based on data mining methods which are either symbolic or numerical. Symbolic methods are based on pattern mining (e.g. mining frequent itemsets, association rules, sequences...), Formal Concept Analysis (FCA) and extensions such as Pattern Structures and Relational Concept Analysis (RCA), and redescription mining. Numerical methods are based on Random Forests, Support Vector Machines (SVM), Neural Networks, and probabilistic approaches such as second-order Hidden Markov Models (HMM). Moreover, for being able to deal with complex data, numerical data mining methods can be associated with symbolic methods, for improving applicability and efficiency of knowledge discovery. This is particularly true in classification, where supervised and unsupervised approaches may be combined with benefits.

A main operation in the research work of Orpailleur is “classification”, which is a polymorphic process involved in modeling, mining, representing, and reasoning tasks. In this way, domain knowledge, when available, can improve and guide the KDD process, materializing the idea of Knowledge Discovery guided by Domain Knowledge or KDDK. In KDDK, domain knowledge plays a role at each step of KDD: the discovered patterns can be interpreted as knowledge units and reused for problem-solving activities in knowledge systems, implementing the exploratory process “mining, interpreting, modeling, representing, and reasoning”. Then knowledge discovery can be considered as a key task in knowledge engineering (KE), having an impact in various semantic activities, e.g. information retrieval, recommendation, and ontology engineering. In addition, if knowledge discovery can feed knowledge-based systems, in turn, domain knowledge can be used to support the knowledge discovery process.

Finally, life sciences, i.e. agronomy, biology, chemistry, and medicine, are application domains where the Orpailleur team has a very rich experience. The team intends to keep and to extend this experience, paying also more attention to the impact of knowledge discovery in the real world. This should lead to the design of green (sustainable), explainable, and fair data mining systems.

Text Mining

Keywords: text mining, knowledge discovery from texts, text classification, annotation, ontology engineering from texts.

The objective of a text mining process is to extract useful knowledge units from large collections of texts . The text mining process shows specific characteristics due to the fact that texts are complex objects written in natural language. The information in a text is expressed in an informal way, following linguistic rules, making text mining a difficult task. A text mining process has to take into account –as much as possible– paraphrases, ambiguities, specialized vocabulary and terminology. This is why the preparation of texts for text mining is usually dependent on linguistic resources and methods.

From a knowledge discovery perspective, text mining aims at extracting “interesting units” (nouns and relations) from texts with the help of domain knowledge encoded within a knowledge base. The process is roughly similar for text annotation. Text mining is especially useful in the context of semantic web for ontology engineering. In the Orpailleur team, we work on the mining of real-world texts in application domains such as biology and medicine, using numerical and symbolic data mining methods. Accordingly, the text mining process may be involved in a loop used to enrich and to extend linguistic resources. In turn, linguistic and ontological resources can be exploited to guide a “knowledge-based text mining process”.

Knowledge Systems and Web of Data

Keywords: knowledge engineering, web of data, semantic web, ontology, description logics, classification-based reasoning, case-based reasoning, information retrieval, recommendation.

The web of data constitutes a good platform for experimenting ideas on knowledge engineering (KE) and knowledge discovery. A software agent may be able to read, understand, and manipulate information on the web, if and only if the knowledge necessary for achieving those tasks is available. This is why domain knowledge and ontologies are of main importance. OWL (“Web Ontology Language” https://www.w3.org/OWL/) is based on description logics (DLs ) and is the representation language commonly used for designing ontologies. In OWL, knowledge units are represented by classes having properties and instances. Concepts are organized within a partially ordered set based on a subsumption relation, and the inference services are based on subsumption and classification.

Actually, there are many interconnections between concept lattices in FCA and ontologies, e.g. the partial order underlying an ontology can be supported by a concept lattice. Moreover, a pair of implications within a concept lattice can provide a possible materialization of a concept definition in an ontology. In this way, we study how the web of data, considered as a set of knowledge sources, e.g. DBpedia, Wikipedia, Yago, Freebase, can be mined for guiding the design of a knowledge base, and further, how knowledge discovery techniques can be applied for allowing a better usage of the web of data, e.g. Linked Open Data (LOD) classification and completion.

Then, a part of the research work in Knowledge Engineering is oriented towards knowledge discovery in the web of data, as, with the increased interest in machine processable data, more and more data is now published in RDF (Resource Description Framework) format. Particularly, we are interested in the completeness of the data and their potential to provide concept definitions in terms of necessary and sufficient conditions. We have proposed algorithms based on FCA and Redescription Mining which allow data exploration as well as the discovery of definition (bidirectional implication rules).

Application Domains Life Sciences: Agronomy, Biology, Chemistry, and Medicine

Keywords: knowledge discovery in life sciences, biology, chemistry, medicine, pharmacogenomics and precision medicine.

One major application domain which is currently investigated by the Orpailleur team is related to life sciences, with particular emphasis on biology, medicine, and chemistry. The understanding of biological systems provides complex problems for computer scientists, and the developed solutions bring new research ideas or possibilities for biologists and for computer scientists as well. Indeed, the interactions between researchers in biology and researchers in computer science improve not only knowledge about systems in biology, chemistry, and medicine, but knowledge about computer science as well.

Knowledge discovery is gaining more and more interest and importance in life sciences for mining either homogeneous databases such as protein sequences and structures, or heterogeneous databases for discovering interactions between genes and the environment, or between genetic and phenotypic data, especially for public health and precision medicine (pharmacogenomics). Pharmacogenomics is one main challenge for the Orpailleur team as it considers a large panel of complex data ranging from biological to medical data, and various kinds of encoded domain knowledge ranging from texts to formal ontologies.

On the same line as biological data, chemical data are presenting important challenges w.r.t. knowledge discovery, for example for mining collections of molecular structures and collections of chemical reactions in organic chemistry. The mining of such collections is an important task for various reasons including the challenge of graph mining and the industrial needs (especially in drug design, pharmacology and toxicology). Molecules and chemical reactions are complex data that can be modeled as labeled graphs. Graph mining and Formal Concept Analysis methods play an important role in this application domain and can be used in an efficient and well-founded way .

Finally, research in agronomy is mainly based on cooperation between Inria and INRA. One research dimension is related to the characterization and the simulation of hedgerow structures in agricultural landscapes, based on Hilbert-Peano curves and Markov models . Another research dimension is based on the mining of survey data for evaluating groundwater quality risks .

Highlights of the Year Highlights of the Year

This year we would like to mention two publications as highlights of the year.

The conference paper got the best paper award at the International Conference on Formal Concept Analysis 2019 in Frankfurt, June 2019 (https://icfca2019.frankfurt-university.de/).

Classical properties of functions such as associativity, although algebraically easy to read, are hard to meaningfully interpret. In , Miguel Couceiro and colleagues showed that associative and quasi-trivial operations that are non-decreasing are characterized in terms of total and weak orderings through the so-called single-peakedness property introduced in social choice theory by Duncan Black. This enabled visual interpretations of the above mentioned algebraic properties, and the enumeration of such operations led to several, previously unknown, integer sequences in Sloane’s On-Line Encyclopedia of Integer Sequences (http://www.oeis.org), e.g., A292932, A292933, and A292934.

New Software and Platforms ARPEnTAge

Analyse de Régularités dans les Paysages : Environnement, Territoires, Agronomie

Keywords: Stochastic process - Hidden Markov Models

Functional Description: ARPEnTAge is a software based on stochastic models (HMM2 and Markov Field) for analyzing spatio-temporal data-bases. ARPEnTAge is built on top of the CarottAge system to fully take into account the spatial dimension of input sequences. It takes as input an array of discrete data in which the columns contain the annual land-uses and the rows are regularly spaced locations of the studied landscape. It performs a Time-Space clustering of a landscape based on its time dynamic Land Uses (LUS). Displaying tools and the generation of Time-dominant shape files have also been defined.

Partner: INRA

Contact: Jean-François Mari

URL: http://carottage.loria.fr/index_in_english.html

CarottAge

Keywords: Stochastic process - Hidden Markov Models

Functional Description: The system CarottAge is based on Hidden Markov Models of second order and provides a non supervised temporal clustering algorithm for data mining and a synthetic representation of temporal and spatial data. CarottAge is currently used by INRA researchers interested in mining the changes in territories related to the loss of biodiversity (projects ANR BiodivAgrim and ACI Ecoger) and/or water contamination. CarottAge is also used for mining hydromorphological data. Actually a comparison was performed with three other algorithms classically used for the delineation of river continuum and CarottAge proved to give very interesting results for that purpose.

Participants: Florence Le Ber and Jean-François Mari

Partner: INRA

Contact: Jean-François Mari

URL: http://carottage.loria.fr/index_in_english.html

CORON

Keywords: Data mining - Closed itemset - Frequent itemset - Generator - Association rule - Rare itemset

Functional Description: The Coron platform is a KDD toolkit organized around three main components: (1) Coron-base, (2) AssRuleX, and (3) pre- and post-processing modules.

The Coron-base component includes a complete collection of data mining algorithms for extracting itemsets such as frequent itemsets, closed itemsets, generators and rare itemsets. In this collection we can find APriori, Close, Pascal, Eclat, Charm, and, as well, original algorithms such as ZART, Snow, Touch, and Talky-G. AssRuleX generates different sets of association rules (from itemsets), such as minimal non-redundant association rules, generic basis, and informative basis. In addition, the Coron system supports the whole life-cycle of a data mining task and proposes modules for cleaning the input dataset, and for reducing its size if necessary.

Participants: Adrien Coulet, Aleksey Buzmakov, Amedeo Napoli, Florent Marcuola, Jérémie Bourseau, Laszlo Szathmary, Mehdi Kaytoue, Victor Codocedo and Yannick Toussaint

Contact: Amedeo Napoli

URL: http://coron.loria.fr/site/index.php

LatViz: Visualization of Concept Lattices

Contact: Amedeo Napoli

URL: http://latviz.loria.fr/

Keywords: Formal Concept Analysis, Pattern Structures, Concept Lattice, Implications, Visualization

Functional Description.

LatViz is a tool allowing the construction, the display and the exploration of concept lattices. LatViz proposes some noticeable improvements over existing tools and introduces various functionalities focusing on interaction with experts, such as visualization of pattern structures for dealing with complex non-binary data, AOC-poset which is composed of the core elements of the lattice, concept annotations, filtering based on various criteria and a visualization of implications . This way the user can effectively perform interactive exploratory knowledge discovery as often needed in knowledge engineering.

The LatViz platform can be associated with the Coron platform and extends its visualization capabilities (see http://coron.loria.fr). Recall that the Coron platform includes a complete collection of data mining algorithms for extracting itemsets and association rules.

OrphaMine: Data Mining Platform for Orphan Diseases

Contact: Laureline Nevin

URL: http://orphamine.inria.fr/

Keywords: Bioinformatics, data mining, biology, health, data visualization, drug development.

Functional Description.

The OrphaMine platform enables visualization, data integration and in-depth analytics in the domain of “orphan diseases”, where data is extracted from the OrphaData ontology (http://www.orpha.net/consor/cgi-bin/index.php). At present, we aim at building a true collaborative portal that will serve different actors: (i) a general visualization of OrphaData data for physicians working, maintaining and developing this knowledge database about orphan diseases, (ii) the integration of analytics (data mining) algorithms developed by the different academic actors, (iii) the use of these algorithms to improve our general knowledge of rare diseases.

Siren: Interactive and Visual Redescription Mining

Contact: Esther Catherine Galbrun

URL: http://siren.gforge.inria.fr/main/

Keywords: Redescription mining, Interactivity, Visualization.

Functional Description.

Siren is a tool for interactive mining and visualization of redescriptions. Redescription mining aims to find distinct common characterizations of the same objects and, vice versa, to identify sets of objects that admit multiple shared descriptions. The goal is to provide domain experts with a tool allowing them to tackle their research questions using redescription mining. Merely being able to find redescriptions is not enough. The expert must also be able to understand the redescriptions found, adjust them to better match his domain knowledge and test alternative hypotheses with them, for instance. Thus, Siren allows mining redescriptions in an anytime fashion through efficient, distributed mining, to examine the results in various linked visualizations, to interact with the results either directly or via the visualizations, and to guide the mining algorithm toward specific redescriptions.

New features, such as a visualization of the contribution of individual literals in the queries and the simplification of queries as a post-processing, have been added to the tool.

New Results Mining of Complex Data Nacira Abbas Guilherme Alves Da Silva Alexandre Bazin Alexandre Blansché Lydia Boudjeloud-Assala Quentin Brabant Brieuc Conan-Guez Miguel Couceiro Adrien Coulet Sébastien Da Silva Alain Gély Laurine Huber Nyoman Juniarta Florence Le Ber Tatiana Makhalova Jean-François Mari Pierre Monnin Amedeo Napoli Laureline Nevin Abdelkader Ouali François Pirot Frédéric Pennerath Justine Reynaud Claire Theobald Yannick Toussaint Laura Alejandra Zanella Calzada Georgios Zervakis FCA and Variations: RCA, Pattern Structures, and Biclustering

Advances in data and knowledge engineering have emphasized the needs for pattern mining tools working on complex and possibly large data. FCA, which usually applies to binary data-tables, can be adapted to work on more complex data. In this way, we have contributed to some main extensions of FCA, namely Pattern Structures, Relational Concept Analysis and application of the “Minimum Description Length” (MDL) within FCA. Pattern Structures (PS , ) allow building a concept lattice from complex data, e.g. numbers, sequences, trees and graphs. Relational Concept Analysis (RCA ) is able to analyze objects described both by binary and relational attributes and can play an important role in text classification and text mining. Many developments were carried out in pattern mining and FCA for improving data mining algorithms and their applicability, and for solving some specific problems such as information retrieval, discovery of functional dependencies and biclustering.

We got several results in the discovery of approximate functional dependencies , the mining of RDF data, the visualization of the discovered patterns, and redescription mining. Moreover, based on Relational Concept Analysis, we worked also on the discovery and representation of $n$ -ary relations in the framework of FCA . In the same way, reusing ideas form subgroup discovery, we have initiated a whole line of research on the covering of the pattern spaces based on the “Minimum Description Length” (MDL) principle and we are working on the adaptation of MDL within the FCA framework .

We are also working on designing hybrid mining methods, based on mining methods able to deal with symbolic and numerical data in parallel. In the context of the GEENAGE project, we are interested in the identification, in biomedical data, of biomarkers that are predictive of the development of diseases in the elderly population. Actually, the data are issued from a preceding study on metabolomic data for the detection of diabetes of type 2 . The problem can be viewed as a classification problem where features which are predictive of a class should be identified. This leads us to study the notions of prediction and discrimination in classification problems. Combining numerical machine learning methods such as random forests, neural networks, and SVM, then multicriteria decision making methods (Pareto fronts), and pattern mining methods (including FCA), we developed a hybrid mining approach for selecting the features which are the most predictive and/or discriminant. Then the selected features are organized within a concept lattice to be presented to the analyst together with the reasons for their selection. The concept lattice makes more easy and natural the understanding of the feature selection. As such, this approach can also be seen as an explicable mining method, where the output includes the reasons for which features are selected in terms of prediction and discrimination.

In the framework of the CrossCult European Project about cultural heritage, we worked on the mining of visitor trajectories in a museum or a touristic site. We presented a theoretical and practical research work about the characterization of visitor trajectories and the mining of these trajectories as sequences , . The mining process is based on two approaches in the framework of FCA. We focused on different types of sequences and more precisely on subsequences without any constraint and frequent contiguous subsequences. We also introduced a similarity measure allowing us to build a hierarchical classification which is used for interpretation and characterization of the trajectories. A natural extension of this research work on the characterization of trajectories is related to recommendation, i.e. based on an actual trajectory, how to recommend next items to be visited? Biclustering is a good candidate for designing recommendation methods and we especially worked on this topic this current year. In particular, we worked on several aspects of biclustering in the framework of FCA and we also tried to build a generic and unified framework from which several biclustering methods can be derived , .

Redescription Mining

Redescription mining is one of the pattern mining methods developed in the team. This method aims at finding distinct common characterizations of the same objects and, reciprocally, at identifying sets of objects having multiple shared descriptions . This is motivated by the idea that in scientific investigations data oftentimes have different nature. For example, they might originate from distinct sources or be cast over separate terminologies.

In order to gain insight into the phenomenon of interest, a natural task is to identify the correspondences existing between these different aspects. A practical example in biology consists in finding geographical areas having two characterizations, one in terms of their climatic profile and one in terms of the occupying species. Discovering such redescriptions can contribute to better understand the influence of climate over species distribution. Besides biology, redescription mining can be applied in many concrete domains.

Following this way, we applied redescription mining for analyzing and mining RDF data in the web of data with the objective of discovering definitions of concepts and as well disjunctions (incompatibilities) of concepts, for completing knowledge bases in a semi-automated way . Redescription mining is well adapted to the task as a definition is naturally based on two sides of an equation, a left-hand side and a right-hand side.

Text Mining

The research work in text mining is mainly based on two ongoing PhD theses. The first research subject is related to the study of discourse and argumentation structures in a text based on tree mining and redescription mining , while the second research work is related to the mining of Pubmed abstracts about rare diseases. In the first research line, we investigate the similarities existing between discourse and argumentation structures by aligning subtrees in a corpus where texts are annotated. Contrasting related work, here we focus on the comparison of substructures within the text and not only the matching of relations. Based on data mining techniques such as tree mining and redescription mining, we are able to show that the structures underlying discourse and argumentation can be (partially) aligned. There the annotations related to discourse and argumentation allow us to derive a mapping between the structures. In addition, the approach enables the study of similarities between diverse discourse structures, and as well the differences in terms of expressive power.

In the second research line, the objective is to discover features related to rare diseases, e.g. symptoms, related diseases, treatments, and possible disease evolution or variations. The texts to be analyzed are from Pubmed, i.e. a platform collecting millions of publications in the medical domain. This research project aims at developing new methods and tools for supporting knowledge discovery in textual data by combining methods from Natural Language Processing (NLP) and Knowledge Discovery in Databases (KDD). Here a key idea is to design an interacting and convergent process where NLP methods are used for guiding text mining and KDD methods are used for analyzing textual documents. In this way, NLP is based on extraction of general and temporal information, while KDD methods are especially based on pattern mining, FCA, and graph mining.

Consensus, Aggregation Functions and Multicriteria Decision Aiding Functions

Aggregation and consensus theory study processes dealing with the problem of merging or fusing several objects, e.g., numerical or qualitative data, preferences or other relational structures, into a single or several objects of similar type and that best represents them in some way. Such processes are modeled by so-called aggregation or consensus functions , . The need to aggregate objects in a meaningful way appeared naturally in classical topics such as mathematics, statistics, physics and computer science, but it became increasingly emergent in applied areas such as social and decision sciences, artificial intelligence and machine learning, biology and medicine.

We are working on a theoretical basis of a unified theory of consensus and to set up a general machinery for the choice and use of aggregation functions. This choice depends on properties specified by users or decision makers, the nature of the objects to aggregate as well as computational limitations due to prohibitive algorithmic complexity. This problem demands an exhaustive study of aggregation functions that requires an axiomatic treatment and classification of aggregation procedures as well as a deep understanding of their structural behavior. It also requires a representation formalism for knowledge, in our case decision rules and methods for discovering them. Typical approaches include rough-set and FCA approaches, that we aim to extend in order to increase expressivity, applicability and readability of results. Applications of these efforts already appeared and further are expected in the context of three multidisciplinary projects, namely the “Fight Heart Failure” (research project with the Faculty of Medicine in Nancy), the European H2020 “CrossCult” project, and the “ISIPA” (Interpolation, Sugeno Integral, Proportional Analogy) project.

In the context of the project RHU “Fighting Heart Failure” (that aims to identify and describe relevant bio-profiles of patients suffering from heart failure) we are dealing with biomedical data, highly complex and heterogeneous, that include, among other, sociodemographical aspects, biological and clinical features, drugs taken by the patients, etc. One of our main challenges is to define relevant aggregation operators on this heterogeneous patient data that lead to a clustering of the patients. Each cluster should correspond to a bio-profile, i.e. a subgroup of patients sharing the same form of the disease and thus the same diagnosis and medical care strategy. We are working on ways for comparing and clustering patients, namely, by defining multidimensional similarity measures on this complex and heterogeneous biomedical data. To this end, we recently proposed a novel approach, that we named “unsupervised extremely randomized trees” (UET), that is inspired by the frameworks of unsupervised random forests (URF) and of extremely randomized trees (ET). The empirical study of UET showed that it outperforms existing methods (such as URF) in running time, while giving better clustering. However, UET was implemented for numerical data only, and this is a drawback when dealing with biomedical data.

To overcome this limitation we have recently proposed an adaptation of UET that is agnostic to variable types –numerical, symbolic or both–, that is robust to noise, to correlated variables, and to monotone transformations, thus drastically limiting the need for preprocessing. In addition, this provides similarity measures for clustering purposes that show outperforming results compared to state-of-the-art clustering methodologies.

Also, motivated by current trends in graph clustering for applications in the semantic web, and community identification in computer and social networks, we recently proposed a novel graph clustering method, i.e. GraphTrees , that is based on random decision trees to compute pairwise dissimilarities between vertices in vertex-attributed graphs. Unlike existing methodologies, it applies directly to graphs whose vertex-attributes are heterogeneous without preprocessing, and with promising results in benchmark datasets that are competitive with best known methods.

In the context of the project ISIPA, we mainly focused on the utility-based preference model in which preferences are represented as an aggregation of preferences over different attributes, structured or not, both in the numerical and qualitative settings. In the latter case, the Sugeno integral is widely used in multiple criteria decision making and decision under uncertainty, for computing global evaluations of items based on local evaluations (utilities). The combination of a Sugeno integral with local utilities is called a Sugeno utility functional (SUF). A noteworthy property of SUFs is that they represent multi-threshold decision rules. However, not all sets of multi-threshold rules can be represented by a single SUF. We showed how to represent any set of multi-threshold rules as a combination of SUFs. Moreover, we studied their potential advantages as a compact representation of large sets of rules, as well as an intermediary step for extracting rules from empirical datasets . We also proposed a novel method for learning sets of decision rules that optimally fit the training data and that favors short rules over long ones. This is a competitive alternative to other methods for monotonic classification as in .

Knowledge Discovery in Healthcare and Life Sciences Alexandre Bazin Miguel Couceiro Adrien Coulet Sébastien Da Silva Florence Le Ber Jean-François Mari Pierre Monnin Amedeo Napoli Abdelkader Ouali Yannick Toussaint Ontology-based Clustering of Biological Data

Biomedical objects can be characterized by ontology annotations. For example, Gene Ontology annotations provide information on the functions of genes, while Human Phenotype Ontology (HPO) annotations provide information about phenotypes associated with diseases. It is usual to consider such annotations in the analysis of biomedical data, most of the time annotations from only one single ontology. However, complex objects such as diseases can be annotated at the same time w.r.t. different ontologies, making clear distinct dimensions. We are investigating how annotations from several ontologies may be cooperating in disease classification. In particular, we classified Genetic Intellectual Disabilities, on the basis of their HPO annotations and of Gene Ontology annotations of genes known for being responsible for these diseases . We used clustering algorithms based on semantic similarities that enable us to compare sets of annotations. In particular, this experiment illustrates the fact that considering several ontologies provides better results in clustering, while selecting the best set of ontologies to combine is depending on the dataset and on the classification task. This study is still going on.

Validation of Pharmacogenomic Knowledge

State of the art knowledge in pharmacogenomics is heterogeneous w.r.t. validation. Some units of knowledge are well validated, observed on a large population and already used in clinical practice, while a large majority of this knowledge is lacking validation and reproducibility, mainly because of scarce observation. Accordingly, validating state of the art knowledge in pharmacogenomics by mining Electronic Health Records (EHRs) is one objective of the ANR project “PractiKPharma” initiated in 2016 (http://practikpharma.loria.fr/).

To carry out this validation, we define a minimal data schema for pharmacogenomic knowledge units (PGxO ontology), which is instantiated with data of different provenance (e.g. biomedical databases, literature and EHRs). The output of this instantiation is a (unique) knowledge graph called PGxLOD (https://pgxlod.loria.fr/). We defined and applied a set of so-called “reconciliation rules” that compare and align whenever possible knowledge units of different provenance . The results of these rule applications are of particular interest since they highlight knowledge units defined in various data and knowledge sources. We are continuing this effort by studying how graph convolutional networks enable us to learn and then to compare the representation of $n$ -ary relationships in the form of graph embeddings .

In addition, following our participation in the Biohackathon 2018 in Paris (https://2018.biohackathon-europe.org/), we firstly updated PGxLOD and improved its quality, completeness, and interconnection with other resources. Secondly we mined PGxLOD and searched for explanations about molecular mechanisms of adverse drug responses. Preliminary results where presented at the MedInfo Conference .

Mining Electronic Health Records

In the context of the Snowball Inria Associate Team, we studied the use of Electronic Health Records (EHRs) to predict at first prescription the need for a patient to be prescribed with a reduced drug dose . We particularly focused on drugs whose dosage is known to be sensitive and variable. We used data from the Stanford Hospital to construct cohorts of patients that either did or did not need a dose change for each considered drug. After feature selection, we trained Random Forest models which successfully predict whether a new patient will or not require a dose change after being prescribed one of 23 drugs among 22 drug classes. Several of these drugs are related to clinical guidelines that recommend dose reduction exclusively in the case of adverse reaction. For these cases, a reduction in dosage may be considered as a surrogate for an adverse reaction, which our system could help to predict and to prevent.

In collaboration with Stanford University, we continued studying the development of predictive models from EHR data, in particular to evaluate the risk of atherosclerotic cardiovascular diseases (ASCVD). The evaluation of ASCVD risk is crucial for deciding upon the prescription of preventive therapies such as statins and others lipid lowering therapies. The prevalence of these diseases is depending on subgroups in a population, such as African-American and Asian people, which are indeed under-represented in cohorts that were used to fit the model currently used in clinics to evaluate the risk of ASCVD . Due to such under-representation, biases are appearing in the evaluation of the risk when considering these different subgroups in the population. Then we proposed a method and a predictive model that controls, to some extent, the variability in the prediction of ASCVD when considering such “foreign” subgroups .

Knowledge Engineering and Web of Data Nacira Abbas Alexandre Bazin Miguel Couceiro Adrien Coulet Florence Le Ber Pierre Monnin Amedeo Napoli Justine Reynaud Yannick Toussaint

A first research topic in this axis relies on knowledge discovery in the web of data. This follows the increase of data published in RDF (Resource Description Framework) format and the interest in machine processable data. The quick growth of Linked Open Data (LOD) has led to challenging aspects regarding quality assessment and data exploration of the RDF triples that shape the LOD cloud. In the team, we are particularly interested in the completeness and the quality of data and their potential to provide concept definitions in terms of necessary and sufficient conditions , . We have proposed a novel technique based on Formal Concept Analysis which classifies subsets of RDF data into a concept lattice. This allows data exploration as well as the discovery of implication rules which are used to automatically detect possible completions of RDF data and to provide definitions. Experiments on the DBpedia knowledge base show that this kind of approach is well-founded and effective . In addition, it should be noticed that this research work also involves redescription mining, showing the potential complementarity between definition mining and redescription mining.

The second topic in this axis is related to dependencies . In the relational database model, functional dependencies (FDs) indicate a functional relation between sets of attributes: the values of a set of attributes are determined by the values of another set of attributes. FDs can be generalized into relational dependencies, also known as “link keys” in the web of data . For example, link keys may identify the same book or article in different bibliographical data sources, where a link key is a statement of the form: ${〈 𝚊𝚞𝚝𝚎𝚞𝚛, 𝚌𝚛𝚎𝚊𝚝𝚘𝚛 〉, 〈 𝚝𝚒𝚝𝚛𝚎, 𝚝𝚒𝚝𝚕𝚎 〉} l i n k k e y 〈 𝙻𝚒𝚟𝚛𝚎, 𝙱𝚘𝚘𝚔 〉$ stating that whenever an instance of the class Livre has the same values for properties auteur and titre as an instance of class Book has for properties creator and title, then they denote the same entity. Such link keys are more complex than FDs in databases in several respects and they raise new problems to solve .

One main objective of this research work is to follow the lines initiated in recent papers , and to extend to link keys the characterization of FDs and of Similarity Dependencies within FCA and pattern structures. Indeed, this is one of the objective of the ANR ELKER project. Accordingly, one purpose is to extend the initial proposals based on FCA and to provide adapted implementations. This is part of the thesis work of Nacira Abbas initiated at the end of 2018 . Moreover, we are currently investigating possible connections with Relational Concept Analysis and redescription mining. We would like to study the formulation of the discovery of link keys in reusing and extending some construction heuristics that were developed in redescription mining. Actually, redescription mining is a data mining technique which aims at constructing pairs of descriptions, i.e., pairs of logical statements, one for each of two datasets, such that their support sets, i.e., the sets of objects that satisfy each statements of a pair, respectively, are most similar, as measured for example by their Jaccard index.

Bilateral Contracts and Grants with Industry Bilateral Contracts with Industry AGREV-3 Jean-François Mari

The AGREV 3 project (for “Agriculture Environment Vittel”) is part of “Agrivair” –a subsidiary of Nestlé Waters– in actions to protect the natural resources of natural mineral water. We used ARPEnTAge to mine survey data about the Vittel-Contrexéville territory, which is suspected of groundwater quality risks . This allowed us to locate regions having the same behavior. In addition, this provided a more contrasted simulation by eliminating the influence of stable zones (forests, permanent grasslands) and a more precise definition of a “neutral” model.

Hydreos Nicolas Dante Jean-François Mari Amedeo Napoli

Hydreos is a state organization, so-called “Pôle de compétitivité”, aimed at monitoring and evaluating the quality of water and its delivery (http://www.hydreos.fr/fr). Actually, data about water resources rely on many agronomic variables, including land use successions. The data to be analyzed are obtained by surveys or by satellite images and describe the land use at the level of the agricultural parcel. Then there is a search for detecting changes in land use and for correlating these changes to groundwater quality. Accordingly, one main challenge in our participation in Hydreos is to process and analyze space-time data for reaching a better understanding of the changes in the organization of a territory. The systems ARPEnTAge and CarottAge are used in this context, especially by agronomists of INRA (ASTER Mirecourt http://www6.nancy.inra.fr/sad-aster).

On other aspects, we tested new deep graph convolutional learning over data provided by the SEDIF “Syndicat des eaux d'Île-de-France” to predict the likelihood of water leaks in a network of pipes and compared it with a master thesis where spatial point process techniques were used (master thesis of Nicolas Dante, M2 IMSD Nancy).

The Smart Knowledge Discovery Project Laureline Nevin Amedeo Napoli

The SKD project for “Smart Knowledge Discovery” aims at analyzing complex industrial data for troubleshooting and decision making, and is funded by “Grand Est Region”. We are working on exploratory knowledge discovery with the Vize company, which is based in Nancy and specialized in visualization-based data mining. The data which are under study are provided by the Arcelor-Mittal Steel Company and are related to the monitoring of rolling mills. Data are complex time series and the problem is related to a so-called “predictive maintenance”, or how to anticipate problems in the furnaces and avoid their stop. In this way, one main objective of SKD is to combine sequence mining and visualization tools for recognizing temperature problems in the furnaces, and thus preventing the occurrences of defects in the outputs of the rolling mills.

Partnerships and Cooperations National Initiatives ANR ANR ELKER (2017–2020) Nacira Abbas Miguel Couceiro Amedeo Napoli

The objectives of the ELKER ANR Research Project (https://project.inria.fr/elker/) are to study, formalize, and implement the search for link keys in RDF data . Link keys generalize database keys in two independent directions, as firstly they deal with RDF data and secondly they apply across two relation datasets. In this project, we study the discovery of link keys and reasoning with link keys, being based on the FCA formalism. The ELKER project relies on the competencies of the Orpailleur Team in FCA and pattern structure algorithms, and also in partition pattern structures which are related to the discovery of functional dependencies. This project involves the EPI Orpailleur at Inria Nancy Grand Est, the EPI MOEX at Inria Grenoble Rhône Alpes, and LIASD at Université Paris 8.

ANR PractiKPharma (2016–2020) Miguel Couceiro Adrien Coulet Pierre Monnin Amedeo Napoli Yannick Toussaint

PractiKPharma for “Practice-based evidences for actioning Knowledge in Pharmacogenomics” is an ANR research project (http://practikpharma.loria.fr/) about the validation of domain knowledge in pharmacogenomics. Pharmacogenomics is interested in understanding how genomic variations related to patients have an impact on drug responses. While most of the available knowledge in pharmacogenomics –state of the art knowledge– lies in the biomedical literature, with various levels of validation, an originality of PractiKPharma is to use Electronic Health Records (EHRs) to constitute cohorts of patients where to discover knowledge units. Indeed, these cohorts are mined for discovering potential pharmacogenomics patterns to be then validated w.r.t. literature knowledge for becoming actionable knowledge units. More precisely, firstly we have to discover pharmacogenomic patterns from the literature, and secondly we should confirm or moderate the interpretation and validation of these units by mining EHRs. Comparing knowledge patterns extracted from the literature with facts extracted from EHRs is a complex task depending on the EHR language –the literature is in English whereas EHRs are in French– and on knowledge level, as EHRs represent observations at the patient level whereas the literature is related to sets of patients. The PractiKPharma involves three other laboratories, namely LIRMM in Montpellier, SSPIM in St-Etienne, and CRC in Paris.

ANR AstroDeep (2019–2022) Miguel Couceiro Amedeo Napoli Claire Theobald

Astronomical surveys planned for the coming years will produce data that present analysis challenges not only because of their scale (hundreds of petabytes), but also by the complexity of the measurement challenges on very deep images (for instance subpercent-level measurement of colors or shapes on blended objects). New machine learning techniques appear very promising: once trained, they are very efficient and excel at extracting features from complex images. In the AstroDeep project, we aim at developing such machine learning techniques that can be applied directly on complex images without going through the traditional steps of astronomical image processing, that lose information at each stage. The developed techniques will help to leverage the observation capabilities of future surveys (LSST, Euclid, and WFIRST), and will allow a joint analysis of data.

The AstroDeep ANR Project involves three labs, namely APC Paris (“Astroparticules et Cosmologie Paris”), the Orpailleur Team at Inria Nancy Grand Est/LORIA, and “Département d'Astrophysique CEA Saclay”.

Inria Project Labs, Exploratory Research Actions, and Technological Development Actions Guilherme Alves Da Silva Alexandre Bazin Miguel Couceiro Nyoman Juniarta Tatiana Makhalova Amedeo Napoli Laureline Nevin Abdelkader Ouali Claire Theobald Georgios Zervakis HyAiAI (IPL 2019-2022)

Recent progress in Machine Learning (ML) and especially in Deep Learning has made ML present and prominent in a wide range of applications. However, current and efficient ML approaches rely on complex numerical models. Then, the decisions which are proposed may be accurate but cannot be easily explained to the layman, especially in some cases where complex and human-oriented decisions should be made, e.g. to get a loan or not, to obtain a chosen enrollment at university. The objectives of the HyAIAI IPL are to study the problem of making ML methods interpretable. For that, we will design hybrid ML approaches that combine state of the art numerical models (e.g. neural networks) with explainable symbolic models (e.g. pattern mining). More precisely, one goal is to integrate high level domain constraints into ML models, to provide model designers information on ill-performing parts of the model, and to give the layman/practitioner understandable explanations on the results of the ML model.

The HyAIAI IPL project involves seven Inria Teams, namely Lacodam in Rennes (project leader), Magnet and SequeL in Lille, Multispeech and Orpailleur in Nancy, and TAU in Saclay.

Ordem (ADT 2019-2020)

One of the outputs of the former Hybride ANR project was the Orphamine system which aims at information retrieval and diagnosis aid in the domain of “rare diseases”. The Orphamine system is based on domain knowledge, and in particular on medical ontologies such as ORDO (“Orphanet Rare Diseases Ontology”) and HPO (“Human Phenotype Ontology”). In this way, the objective of the “Ordem” ADT is to update Orphamine, in making the system more accessible and more open. This requires many developments for developing the connections with domain knowledge, graph mining methods for retrieving relevant units in knowledge graphs, actual visualization tools, pattern mining, statistical decision tools for decision making (in particular log-linear models), and as well text mining tools for analyzing expert queries and medical literature about rare diseases. Such developments are and will be carried out until the end of next year, for making the system robust and publicly accessible through a web interface.

HyGraMi (PRE Inria 2018-2020)

Finally, the so called “projet de recherche exploratoire” (PRE) HyGraMi for “Hybrid Graph Mining for the Design of New Antibacterials” is about the fight against resistance of bacteria to antibiotics. The objective of HyGraMi is to design a hybrid data mining system for discovering new antibacterial agents. This system should rely on a combination of numeric and symbolic classifiers, that will be guided by expert domain knowledge. The analysis and classification of the chemical structures is based on an interaction between symbolic methods e.g. graph mining techniques, and numerical supervised classifiers based on exact and approximate matching. This year we work on a method based on tree decomposition for performing feature selection and improving data lining of such complex molecular structures .

European Initiatives The H2020 CrossCult Project (2016-2019) Miguel Couceiro Nyoman Juniarta Amedeo Napoli

The H2020 CrossCult http://www.crosscult.eu/ project aims at making “reflective history” a reality in the European cultural context, by enabling the re-interpretation of European (hi)stories through cross-border interconnections among cultural digital resources, citizen viewpoints and physical venues. The project has two main goals, (i) to lower cultural EU barriers and create unique cross-border perspectives, by connecting existing digital historical resources and by creating new ones through public participation, (ii) to create long-lasting experiences of social learning and entertainment that will provide a better understanding and re-interpretation of European history. To achieve this, CrossCult aims at connecting and combining existing digital cultural assets, at increasing integration, interaction, and reflection about European past and present history. CrossCult was implemented w.r.t. four real-world pilots including cities, museums, and cultural sites. The role of the Orpailleur Team, in conjunction with the LORIA Kiwi Team, was to work on data mining –actually sequence mining– and recommendation, with a focus on the mining visitor trajectories in a museum or a touristic site, and on the definition of a visitor profile in connection with domain knowledge.

The CrossCult project involved many teams, namely Luxembourg Institute for Science and Technology and Centre Virtuel de la Connaissance sur l'Europe (Luxembourg, leaders of the project), University College London (England), University of Malta (Malta), University of Peloponnese and Technological Educational Institute of Athens (Greece), Università degli Studi di Padova (Italy), University of Vigo (Spain), National Gallery (London, England), and GVAM Guìas Interactivas (Spain), and the Kiwi Team from LORIA together with the Orpailleur team.

International Initiatives Inria International Labs

Inria@SiliconValley

Associate Team involved in the International Lab:

Snowball

Title: Discovering knowledge on drug response variability by mining electronic health records

International Partner (Institution - Laboratory - Researcher):

University of Stanford (United States) - Department of Medicine, Stanford Center for Biomedical Informatics Research (BMIR) - Nigam Shah

Start year: 2017

See also: http://snowball.loria.fr/

Snowball (2017-2019) is an Inria Associate Team and the continuation of the preceding Associate Team called Snowflake (2014-2016). The objective of Snowball is to study drug response variability through the lens of Electronic Health Records (EHRs). This is motivated by the fact that many factors, genetic as well as environmental, contribute to different responses from people to the same drug. The mining of EHRs can bring substantial elements for understanding and explaining drug response variability.

Accordingly the objectives of Snowball are to identify in EHR repositories groups of patients which are responding differently to similar treatments, and then to characterize these groups and predict patient drug sensitivity. These objectives are complementary to those of the PractiKPharma ANR project. Moreover, Adrien Coulet finished in September 2019 a two-years sabbatical stay in the lab of Nigam Shah at Stanford University initiated in September 2017 (and partly granted by an “Inria délégation”).

Informal International Partners: Research Collaboration with HSE Moscow Alexandre Bazin Nacira Abbas Guilherme Alves Da Silva Miguel Couceiro Nyoman Juniarta Tatiana Makhalova Amedeo Napoli Justine Reynaud

An ongoing collaboration involves the Orpailleur team and Sergei O. Kuznetsov at Higher School of Economics in Moscow (HSE). Amedeo Napoli visited HSE laboratory several times while Sergei O. Kuznetsov visits Inria Nancy Grand Est every year. The collaboration is materialized by the joint supervision of students (such as the thesis of Aleksey Buzmakov defended in 2015 and the ongoing thesis of Tatiana Makhalova), and the the organization of scientific events, as the workshop FCA4AI with seven editions between 2012 and 2019 (see http://www.fca4ai.hse.ru).

This year, we participated in the writing of common publications around the thesis work of Tatiana Makhalova and the organization of one main event, namely the seventh edition of the FCA4AI workshop in August 2019 at the IJCAI Conference which was held in Macao China.

Dissemination Scientific Events Organization, General Chairs, Scientific Chairs

Amedeo Napoli was the scientific co-chair with Sergei Kuznetsov of the track “General Topics of Data Analysis” at the AIST Conference held in Kazan Russia on July 17-19 2019 (8th International Conference on Analysis of Images, Social Networks, and Texts http://aistconf.org/ and http://aistconf.org/board/).

Amedeo Napoli was the scientific co-chair with Sergei O. Kuznetsov (HSE Moscow) and Sebastian Rudolph (TU Dresden) of the seventh workshop FCA4AI “What can do FCA for Artificial Intelligence”, which was co-located with the IJCAI Conference in Macao China, August 10 2019 (see http://www.fca4ai.hse.ru/).

Miguel Couceiro and Amedeo Napoli were the general and scientific chairs of the 26ièmes Rencontres de la Société Francophone de Classification (SFC 2019) that were held on September 3-5 at Inria NGE/LORIA Nancy (see https://project.inria.fr/sfc2019/).

Scientific Animation

The scientific animation in the Orpailleur team is based on the Team Seminar which is called the “Malotec” seminar (http://malotec.loria.fr/). The Malotec seminar is held in general twice a month and is used either for general presentations of members of the team or for invited presentations of external researchers.

Members of the Orpailleur team are all involved, as members or as head persons, in various national research groups.

The members of the Orpailleur team are involved in the organization of conferences and workshops, as members of conference program committees (AAAI, ECAI, ECML-PKDD, ESWC, ICCBR, ICDM, ICFCA, IJCAI, ISWC, KDD, SDM...), as members of editorial boards, and finally in the organization of journal special issues.

Teaching - Supervision - Juries Teaching

All the permanent members of the Orpailleur team are involved in teaching at all levels and mainly at Université de Lorraine. Actually, most of the members of the Orpailleur team are employed on “Université de Lorraine” positions.

Responsability of the 2nd year of the NLP Master's program in the IDMC, Université de Lorraine.

Local coordination of the European Erasmus Mundus Master's program LCT (Language and Communication Technologies).

The LCT Master’s program “Language and communication Technologies” (LCT) is designed to provide students with practice–oriented knowledge in computational and theoretical linguistics, natural language processing, and computer science, to meet the demands of industry and research in these rapidly growing areas. The LCT consortium includes 7 European Universities, i.e. Saarland, Lorraine, Trento, Malta, Groningen, Charles in Prague, Basque Country, and includes several partners, e.g., DFKI, IBM (Czech Rep.), VICOMTECH, Sony (Europe), IBM (Ireland), and Inria (France).

Responsability in teaching courses about Artificial Intelligence and Knowledge-Based Systems at TELECOM Nancy, a engineer school for graduation in computer science at Université de Lorraine.

Supervision – Juries

The members of the Orpailleur team are also involved in student supervision, at all university levels, from under-graduate until post-graduate students, engineers, PhD, postdoc students.

Finally, the permanent members of the Orpailleur team are involved in HDR and thesis defenses, being thesis referees or thesis committee members.

Scientific Domain Knowledge Improves Exoplanet Transit Classification with Deep Learning Megan Ansdell M. Yani Ioannou Y. Hugh Osborn H. Michele Sasdelli M. Jeffrey Smith J. Douglas Caldwell D. Jon Jenkins J. Chedy Raïssi C. Daniel Angerhausen D. The Astrophysical Journal Letters 869 1 December 2018 L7 https://hal.inria.fr/hal-01957950 Link key candidate extraction with relational concept analysis Manuel Atencia M. Jérôme David J. Jérôme Euzenat J. Amedeo Napoli A. Jérémy Vizzini J. Discrete Applied Mathematics 2019 1-19 https://hal.archives-ouvertes.fr/hal-02196757 On-demand Relational Concept Analysis Alexandre Bazin A. Jessie Carbonnel J. Marianne Huchard M. Giacomo Kahn G. Priscilla Keip P. Amirouche Ouzerdine A. Diana Cristea D. Florence Le Ber F. L. Baris Sertkaya B. ICFCA: 15th International Conference on Formal Concept Analysis Frankfurt, Germany Formal Concept Analysis 11511 Springer International Publishing 2019 155-172 https://hal-lirmm.ccsd.cnrs.fr/lirmm-02092140 Behavior of Analogical Inference w.r.t. Boolean Functions Miguel Couceiro M. Nicolas Hug N. Henri Prade H. Gilles Richard G. IJCAI 2018 - 27th International Joint Conference on Artificial Intelligence Stockholm, Sweden July 2018 2057–2063 https://hal.inria.fr/hal-02139765 On the efficiency of normal form systems for representing Boolean functions Miguel Couceiro M. Erkko Lehtonen E. Pierre Mercuriali P. Romain Péchoux R. Theoretical Computer Science 2019 https://hal.inria.fr/hal-02153506 Predicting the need for a reduced drug dose, at first prescription Adrien Coulet A. Nigam H. Shah N. H. Maxime Wack M. Mohammad Chawki M. Nicolas Jay N. Michel Dumontier M. Scientific Reports 8 1 October 2018 https://hal.inria.fr/hal-01901566 Numerical Pattern Mining Through Compression Tatiana Makhalova T. Sergei O. Kuznetsov S. O. Amedeo Napoli A. DCC 2019 - 2019 Data Compression Conference Snowbird, United States IEEE March 2019 112-121 https://hal.archives-ouvertes.fr/hal-02162927 Time Space Simulation of Land Use changes by stochastic modeling Jean-François Mari J.-F. Arnaud Gobillot A. Marc Benoît M. Revue Internationale de Géomatique 28 2 August 2018 219–242 https://hal.inria.fr/hal-01662140 PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison Pierre Monnin P. Joël Legrand J. Graziella Husson G. Patrice Ringot P. Andon Tchechmedjiev A. Clement Jonquet C. Amedeo Napoli A. Adrien Coulet A. BMC Bioinformatics 20 S4 April 2019 https://hal.inria.fr/hal-02103899 Using Redescriptions and Formal Concept Analysis for Mining Definitions Linked Data Justine Reynaud J. Yannick Toussaint Y. Amedeo Napoli A. ICFCA 2019 - 15th International Conference on Formal Concept Analysis Francfort, Germany June 2019 https://hal.inria.fr/hal-02170760 Société Francophone de Classification (SFC) Actes des 26èmes Rencontres Actes des 26èmes Rencontres de la Société Francophone de Classification (SFC) Miguel Couceiro M. Amedeo Napoli A. 2019 147 https://hal.inria.fr/hal-02432406 Workshop Notes of the Seventh International Workshop "What can FCA do for Artificial Intelligence?" Sergei O. Kuznetsov S. O. Amedeo Napoli A. Sebastian Rudolph S. CEUR Workshop Proceedings 2529 CEUR-WS.org 2019 87 https://hal.inria.fr/hal-02431335 Analysis of Images, Social Networks and Texts Lecture Notes in Computer Science Wil M.P. van der Aalst W. M. Vladimir Batagelj V. Dmitry I. Ignatov D. I. Michael Khachay M. Valentina Kuskova V. Andrey Kutuzov A. Sergei O. Kuznetsov S. O. Irina A. Lomazova I. A. Natalia Loukachevitch N. Amedeo Napoli A. Panos M. Pardalos P. M. Marcello Pelillo M. Andrey V. Savchenko A. V. Elena Tutubalina E. Wil M.P. van der Aalst W. M. Vladimir Batagelj V. Irina A. Lomazova I. A. Natalia Loukachevitch N. Amedeo Napoli A. Panos M. Pardalos P. M. Marcello Pelillo M. Andrey V. Savchenko A. V. Elena Tutubalina E. Dmitry I. Ignatov D. I. Michael Khachay M. Michael Khachay M. Valentina Kuskova V. Andrey Kutuzov A. Sergei O. Kuznetsov S. O. 11832 Springer 2019 426 https://hal.inria.fr/hal-02432920 Lattice polynomial functions for interpolation and monotonic classification Quentin Brabant Q. Université de Lorraine January 2019 https://hal.univ-lorraine.fr/tel-02096400 Theses Mappings between data, texts and knowledge for biomedical knowledge discovery Adrien Coulet A. Université de Lorraine December 2019 https://hal.inria.fr/tel-02429926 Habilitation à diriger des recherches Mining complex data and biclustering using formal concept analysis Nyoman Juniarta N. Université de Lorraine December 2019 https://hal.inria.fr/tel-02426034 Theses Mining definitions in the web of data Justine Reynaud J. Université de Lorraine (Nancy) December 2019 https://hal.inria.fr/tel-02426421 Theses Cost Function Networks to Solve Large Computational Protein Design Problems David Allouche D. Sophie Barbe S. Simon De Givry S. George Katsirelos G. Yahia Lebbah Y. Samir Loudni S. Abdelkader Ouali A. Thomas Schiex T. David Simoncini D. Matthias Zytnicki M. Malek Masmoudi M. Bassem Jarboui B. Patrick Siarry P. Operations Research and Simulation in healthcare Springer 2019 https://hal.archives-ouvertes.fr/hal-02177634 Link key candidate extraction with relational concept analysis Manuel Atencia M. Jérôme David J. Jérôme Euzenat J. Amedeo Napoli A. Jérémy Vizzini J. 0166-218X Discrete Applied Mathematics 2019 1-19 https://hal.archives-ouvertes.fr/hal-02196757 Interpolation by lattice polynomial functions: a polynomial time algorithm Quentin Brabant Q. Miguel Couceiro M. José Rui Figueira J. R. 0165-0114 Fuzzy Sets and Systems 368 August 2019 101-118 https://hal.archives-ouvertes.fr/hal-01958903 Every quasitrivial n-ary semigroup is reducible to a semigroup Miguel Couceiro M. Jimmy Devillet J. 0002-5240 Algebra Universalis 2019 https://hal.inria.fr/hal-02099236 Quasitrivial semigroups: Characterizations and enumerations Miguel Couceiro M. Jimmy Devillet J. Jean-Luc Marichal J.-L. 0037-1912 Semigroup Forum 98 3 June 2019 472-498 https://hal.inria.fr/hal-01826868 New directions in ordinal evaluation: Sugeno integrals and beyond Miguel Couceiro M. Didier Dubois D. Hélène Fargier H. Michel Grabisch M. Henri Prade H. Agnès Rico A. M. Doumpos M. J.R. Figueira J. S. Greco S. C. Zopounidis C. New Perspectives in Multiple Criteria Decision Making Springer, Cham April 2019 https://hal.inria.fr/hal-01941776 On the efficiency of normal form systems for representing Boolean functions Miguel Couceiro M. Erkko Lehtonen E. Pierre Mercuriali P. Romain Péchoux R. 0304-3975 Theoretical Computer Science 2019 https://hal.inria.fr/hal-02153506 Computing version spaces in the qualitative approach to multicriteria decision aid Miguel Couceiro M. Miklos Maróti M. Tamas Waldhauser T. Lazlo Zadori L. 0129-0541 International Journal of Foundations of Computer Science 30 2 February 2019 333-353 https://hal.inria.fr/hal-01404590 On the complexity of minimizing median normal forms of monotone Boolean functions and lattice polynomials Miguel Couceiro M. Pierre Mercuriali P. Romain Péchoux R. Abdallah Saffidine A. 1542-3980 Journal of Multiple-Valued Logic and Soft Computing 33 3 2019 197-218 https://hal.inria.fr/hal-01905491 Mixed circular codes Elena Fimmel E. Christian C. Michel C. C. François Pirot F. Jean-Sébastien Sereni J.-S. Lutz Strüngmann L. 0025-5564 Mathematical Biosciences 317 July 2019 108231 https://hal.archives-ouvertes.fr/hal-02188407 A hybrid and exploratory approach to knowledge discovery in metabolomic data Dhouha Grissa D. Blandine Comte B. Mélanie Petera M. Estelle Pujos-Guillot E. Amedeo Napoli A. 0166-218X Discrete Applied Mathematics January 2019 https://hal.inria.fr/hal-02195463 Order-preserving Biclustering Based on FCA and Pattern Structures Nyoman Juniarta N. Miguel Couceiro M. Amedeo Napoli A. Annalisa Appice A. Michelangelo Ceci M. Corrado Loglisci C. Giuseppe Manco G. Elio Masciari E. Zbigniew W. Ras Z. W. Complex Pattern Mining: New Challenges, Methods and Applications Springer Series on Studies in Computational Intelligence 2019 https://hal.inria.fr/hal-02181585 PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison Pierre Monnin P. Joël Legrand J. Graziella Husson G. Patrice Ringot P. Andon Tchechmedjiev A. Clement Jonquet C. Amedeo Napoli A. Adrien Coulet A. 1471-2105 BMC Bioinformatics 20 S4 April 2019 https://hal.inria.fr/hal-02103899 Atherosclerotic Cardiovascular Disease Risk Prediction in Disaggregated Asian and Hispanic Subgroups Using Electronic Health Records Fatima Rodriguez F. Sukyung Chung S. Manuel Blum M. Adrien Coulet A. Sanjay Basu S. Latha Palaniappan L. 2047-9980 Journal of the American Heart Association 8 14 July 2019 https://hal.inria.fr/hal-02196129 Linkex: A Tool for Link Key Discovery Based on Pattern Structures Nacira Abbas N. Jérôme David J. Amedeo Napoli A. ICFCA 2019 - workshop on Applications and tools of formal concept analysis Frankfurt, Germany Proc. ICFCA workshop on Applications and tools of formal concept analysis 2019 33-38 https://hal.archives-ouvertes.fr/hal-02168775 International Conference on Formal Concept Analysis 15 ICFCA abbas2019a Towards a Constrained Clustering Algorithm Selection Guilherme Alves G. Miguel Couceiro M. Amedeo Napoli A. Miguel Couceiro M. Amedeo Napoli A. 26èmes Rencontres de la Société Francophone de Classification Nancy, France Actes des 26èmes Rencontres de la Société Francophone de Classification September 2019 https://hal.archives-ouvertes.fr/hal-02397436 Rencontres de la Société Francophone de Classification 26 SFC Sélection de mesures de similarité pour les données catégorielles Guilherme Alves G. Miguel Couceiro M. Amedeo Napoli A. 20ème édition de la conférence Extraction et Gestion des Connaissances (EGC) Bruxelles, Belgium January 2020 https://hal.archives-ouvertes.fr/hal-02410221 Journées d'Extraction et Gestion des Connaissances 20 EGC On-demand Relational Concept Analysis Alexandre Bazin A. Jessie Carbonnel J. Marianne Huchard M. Giacomo Kahn G. Priscilla Keip P. Amirouche Ouzerdine A. Diana Cristea D. Florence Le Ber F. Baris Sertkaya B. ICFCA: 15th International Conference on Formal Concept Analysis Frankfurt, Germany Formal Concept Analysis 11511 Springer International Publishing 2019 155-172 https://hal-lirmm.ccsd.cnrs.fr/lirmm-02092140 International Conference on Formal Concept Analysis 15 ICFCA Sampling Representation Contexts with Attribute Exploration Victor Codocedo V. Jaume Baixeries J. Mehdi Kaytoue M. Amedeo Napoli A. Diana Cristea D. Florence Le Ber F. Baris Sertkaya B. 15th International Conference on Formal Concept Analysis Frankfurt, Germany Proceedings of the 15th International Conference on Formal Concept Analysis (ICFCA 2019) Lecture Notes in Artificial Intelligence 11511 Springer May 2019 307-314 https://hal.inria.fr/hal-02195498 International Conference on Formal Concept Analysis 15 ICFCA Fine-Grained Complexity of Constraint Satisfaction Problems through Partial Polymorphisms: A Survey (Dedicated to the memory of Professor Ivo Rosenberg) Miguel Couceiro M. Lucien Haddad L. Victor Lagerkvist V. ISMVL2019 - IEEE 49th International Symposium on Multiple-Valued Logic Fredericton, NB, Canada May 2019 https://hal.inria.fr/hal-02190089 IEEE International Symposium on Multiple-Valued Logic 49 ISMVL The mathematics of Ivo Rosenberg (Dedicated to the memory of Professor Ivo Rosenberg) Miguel Couceiro M. Lucien Haddad L. Maurice Pouzet M. ISMVL2019 - IEEE 49th International Symposium on Multiple-Valued Logic Fredericton, NB, Canada May 2019 43-48 https://hal.inria.fr/hal-02190088 IEEE International Symposium on Multiple-Valued Logic 49 ISMVL Elements About Exploratory, Knowledge-Based, Hybrid, and Explainable Knowledge Discovery Miguel Couceiro M. Amedeo Napoli A. Diana Cristea D. Florence Le Ber F. Baris Sertkaya B. ICFCA 2019 - 15th International Conference on Formal Concept Analysis Frankfurt, Germany Proceedings of the 15th International Conference on Formal Concept Analysis Lecture Notes in Artificial Intelligence 11511 Springer 2019 3-16 https://hal.inria.fr/hal-02195480 International Conference on Formal Concept Analysis 15 ICFCA Les forêts d'arbres extrêmement aléatoires : utilisation dans un cadre non supervisé Kevin Dalleau K. Miguel Couceiro M. Malika Smaïl-Tabbone M. EGC 2019 - 19ème Conférence Francophone sur l'Extraction et Gestion des connaissances Metz, France RNTI E-35 Hermann-Éditions January 2019 395-400 https://hal.inria.fr/hal-02099532 Journées d'Extraction et Gestion des Connaissances 2019 EGC Aligning Discourse and Argumentation Structures using Subtrees and Redescription Mining Laurine Huber L. Yannick Toussaint Y. Charlotte Roze C. Mathilde Dargnat M. Chloé Braud C. 6th International Workshop on Argument Mining Florence, Italy Proceedings of the 6th Workshop on Argument Mining August 2019 https://hal.archives-ouvertes.fr/hal-02165048 Workshop on Argumentation Mining 6 Aligning Discourse and Argumentation Structures using Subtrees and Redescription Mining Laurine Huber L. Yannick Toussaint Y. Charlotte Roze C. Mathilde Dargnat M. Chloé Braud C. 26èmes Rencontres de la Société Francophone de Classification (SFC) Nancy, France Actes des 26èmes Rencontres de la Société Francophone de Classification (SFC) September 2019 https://hal.archives-ouvertes.fr/hal-02266623 Rencontres de la Société Francophone de Classification 26 SFC A Unified Approach to Biclustering Based on Formal Concept Analysis and Interval Pattern Structures Nyoman Juniarta N. Miguel Couceiro M. Amedeo Napoli A. DS 2019 - 22nd International Conference on Discovery Science Split, Croatia Discovery Science - 22nd International Conference October 2019 https://hal.inria.fr/hal-02266200 International Conference on Discovery Science 22 DS Application des Pattern Structures à la découverte de biclusters à changements de signes cohérents Nyoman Juniarta N. Miguel Couceiro M. Amedeo Napoli A. EGC 2019 - 19ème Conférence francophone sur Extraction et Gestion des connaissances Metz, France RNTI E-35 Hermann-Éditions January 2019 285-290 https://hal.inria.fr/hal-02099607 Journées d'Extraction et Gestion des Connaissances 2019 EGC Numerical Pattern Mining Through Compression Tatiana Makhalova T. Sergei O. Kuznetsov S. O. Amedeo Napoli A. DCC 2019 - 2019 Data Compression Conference Snowbird, United States IEEE March 2019 112-121 https://hal.archives-ouvertes.fr/hal-02162927 Data Compression Conference 2019 On compression, learning & searching regularity in big data Tatiana Makhalova T. Sergei O. Kuznetsov S. O. Amedeo Napoli A. 13es journées scientifiques Toulon, France March 2019 https://hal.archives-ouvertes.fr/hal-02162931 Journees Scientifiques de l'Universite de Toulon 13 On Coupling FCA and MDL in Pattern Mining Tatiana Makhalova T. Sergei O. Kuznetsov S. O. Amedeo Napoli A. Diana Cristea D. Florence Le Ber F. Baris Sertkaya B. The 15th International Conference on Formal Concept Analysis Frankfurt, Germany 11511 Springer May 2019 332-340 https://hal.archives-ouvertes.fr/hal-02162928 International Conference on Formal Concept Analysis 15 ICFCA On Entropy in Pattern Mining Tatiana Makhalova T. Sergei O. Kuznetsov S. O. Amedeo Napoli A. SFC 2019 - XXVIe Rencontres de la Société Francophone de Classification Nancy, France September 2019 https://hal.archives-ouvertes.fr/hal-02193296 Rencontres de la Société Francophone de Classification 26 SFC Pattern Mining through compression: towards to probabilistic models Tatiana Makhalova T. Sergei O. Kuznetsov S. O. Amedeo Napoli A. Oleg P. Kuznetsov O. P. Igor A. Sokolov I. A. Stanislav N. Vasiliev S. N. Gennady S. Osipov G. S. Proceedings of the 17th Russian Conference on Artificial Intelligence Ulyanovsk, Russia Proceedings of the 17th Russian Conference on Artificial Intelligence 2 Yarushkina, Nadezhda G. Russian Association of Artificial Intelligence and Institute of Control Sciences Academician VA Trapeznikov and Ulyanovsk State Technical University and Federal Research Center “Computer Science and Control” October 2019 164-172 https://hal.archives-ouvertes.fr/hal-02192794 Russian Conference on Artificial Intelligence 17 RCAI A Study of Boolean Matrix Factorization Under Supervised Settings Tatiana Makhalova T. Martin Trnecka M. ICFCA 2019 - The 15th International Conference on Formal Concept Analysis Frankfurt, Germany Springer May 2019 341-348 https://hal.archives-ouvertes.fr/hal-02162929 International Conference on Formal Concept Analysis 15 ICFCA Knowledge Reconciliation with Graph Convolutional Networks: Preliminary Results Pierre Monnin P. Chedy Raïssi C. Amedeo Napoli A. Adrien Coulet A. Mehwish Alam M. Davide Buscaldi D. Michael Cochez M. Francesco Osborne F. Diego Reforgiato Recupero D. R. Harald Sack H. DL4KG2019 - Workshop on Deep Learning for Knowledge Graphs Portoroz, Slovenia CEUR Workshop Proceedings 2377 June 2019 https://hal.inria.fr/hal-02155546 Workshop on Deep Learning for Knowledge Graphs 2019 DL4KG A Feature Selection Method based on Tree Decomposition of Correlation Graph Abdelkader Ouali A. Nyoman Juniarta N. Bernard Maigret B. Amedeo Napoli A. LEG@ECML-PKDD 2019 - The third International Workshop on Advances in Managing and Mining Large Evolving Graphs Würzburg, Germany September 2019 https://hal.archives-ouvertes.fr/hal-02194229 ECML/PKDD International Workshop on Advances in Managing and Mining Large Evolving Graphs 3 LEG Creating Fair Models of Atherosclerotic Cardiovascular Disease Risk Stephen Pfohl S. Ben Marafino B. Adrien Coulet A. Fatima Rodriguez F. Latha Palaniappan L. Nigam Shah N. AIES '19 - Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society Honolulu, United States ACM Press January 2019 271-278 https://hal.inria.fr/hal-02388730 AAAI/ACM Conference on AI, Ethics, and Society 2 AIES Redescription mining for learning definitions and disjointness axioms in Linked Open Data Justine Reynaud J. Yannick Toussaint Y. Amedeo Napoli A. ICCS 2019 - 24th International Conference on Conceptual Structures Marburg, Germany July 2019 https://hal.inria.fr/hal-02170763 International Conference on Conceptual Structures 24 ICCS Using Redescriptions and Formal Concept Analysis for Mining Definitions Linked Data Justine Reynaud J. Yannick Toussaint Y. Amedeo Napoli A. ICFCA 2019 - 15th International Conference on Formal Concept Analysis Francfort, Germany June 2019 https://hal.inria.fr/hal-02170760 International Conference on Formal Concept Analysis 15 ICFCA Minimizing Range Rules for Packet Filtering Using a Double Mask Representation ahmad abboud a. Abdelkader Lahmadi A. Michaël Rusinowitch M. Miguel Couceiro M. Adel Bouhoula A. May 2019 https://hal.inria.fr/hal-02393008 IFIP Networking 2019 Poster Minimizing Range Rules for Packet Filtering Using Double Mask Representation ahmad abboud a. Abdelkader Lahmadi A. Michaël Rusinowitch M. Miguel Couceiro M. Adel Bouhoula A. Saif El Hakk Awainia S. E. H. Mondher Ayadi M. April 2019 https://hal.inria.fr/hal-02102225 working paper or preprint Linked Open Data Validity – A Technical Report from ISWS 2018 Mehwish Alam M. Tayeb Abderrahmani Ghorfi T. A. Esha Agrawal E. Omar Alqawasmeh O. Amina Annane A. Claudia d'Amato C. Amr Azzam A. Andrew Berezovskyi A. Russa Biswas R. Mathias Bonduel M. Quentin Brabant Q. Cristina-iulia Bucur C.-i. Elena Camossi E. Valentina Anita Carriero V. A. Shruthi Chari S. David Chaves Fraga D. C. Fiorela Ciroku F. Michael Cochez M. Vincenzo Cutrona V. Rahma Dandan R. Pedro Del Pozo Jimnez P. D. P. Danilo Dess D. Valerio Di Carlo V. Ahmed El Amine Djebri A. E. A. Marieke Van Erp M. Faiq Miftakhul Falakh F. M. Alba Fernndez Izquierdo A. F. Giuseppe Futia G. Aldo Gangemi A. Simone Gasperoni S. Arnaud Grall A. Lars Heling L. Pierre-Henri Paris P.-H. Noura Herradi N. Subhi Issa S. Samaneh Jozashoori S. Nyoman Juniarta N. Lucie-aime Kaffee L.-a. Ilkcan Keles I. Prashant Khare P. Viktor Kovtun V. Valentina Leone V. Siying Li S. Sven Lieber S. Pasquale Lisena P. Tatiana Makhalova T. Ludovica Marinucci L. Thomas Minier T. Benjamin Moreau B. Alberto Moya Loustaunau A. M. Durgesh Nandini D. Sylwia Ozdowska S. Amanda Pacini De Moura A. P. Swati Padhee S. Guillermo Palma G. Valentina Presutti V. Roberto Reda R. Ettore Rizza E. Henry Rosales-mndez H. Sebastian Rudolph S. Harald Sack H. Luca Sciullo L. Humasak Simanjuntak H. Carlo Stomeo C. Thiviyan Thanapalasingam T. Tabea Tietz T. Dalia Varanka D. Maria-Esther Vidal M.-E. Michael Wolowyk M. Maximilian Zocholl M. April 2019 https://hal.inria.fr/hal-02087112 https://arxiv.org/abs/1903.12554 - working paper or preprint Similarity Measure Selection for Categorical Data Clustering Guilherme Alves G. Miguel Couceiro M. Amedeo Napoli A. December 2019 https://hal.archives-ouvertes.fr/hal-02399640 working paper or preprint Learning rule sets and Sugeno integrals for monotonic classification problems Quentin Brabant Q. Miguel Couceiro M. Didier Dubois D. Henri Prade H. Agnès Rico A. December 2019 https://hal.inria.fr/hal-02427608 working paper or preprint Providing Molecular Characterization for Unexplained Adverse Drug Reactions : Podium Abstract François-Élie Calvier F.-É. Pierre Monnin P. Miguel Boland M. Patryk Jarnot P. Emmanuel Bresso E. Malika Smaïl-Tabbone M. Adrien Coulet A. Cedric Bousquet C. July 2019 https://hal.inria.fr/hal-02196134 Podium Abstract at MedInfo 2019, Lyon, France Reducibity of n-ary semigroups: from quasitriviality towards idempotency Miguel Couceiro M. Jimmy Devillet J. Jean-Luc Marichal J.-L. Pierre Mathonet P. September 2019 https://hal.inria.fr/hal-02293908 working paper or preprint Computing Vertex-Vertex Dissimilarities Using Random Trees: Application to Clustering in Graphs Kevin Dalleau K. Miguel Couceiro M. Malika Smail-Tabbone M. November 2019 https://hal.inria.fr/hal-02427563 working paper or preprint Clustering graphs using random trees Kevin Dalleau K. Miguel Couceiro M. Malika Smaïl-Tabbone M. September 2019 https://hal.inria.fr/hal-02282207 working paper or preprint Unsupervised Extra Trees: a stochastic approach to compute similarities in heterogeneous data Kevin Dalleau K. Miguel Couceiro M. Malika Smaïl-Tabbone M. January 2019 https://hal.inria.fr/hal-01982232 working paper or preprint Comma-free Codes Over Finite Alphabets Elena Fimmel E. Christian C. Michel C. C. François Pirot F. Jean-Sébastien Sereni J.-S. Lutz Strüngmann L. November 2019 https://hal.archives-ouvertes.fr/hal-02376793 working paper or preprint Pattern Structures for Identifying Biclusters with Coherent Sign Changes Proceedings of the 15th International Conference on Formal Concept Analysis (Supplementary Proceedings) Nyoman Juniarta N. Victor Codocedo V. Miguel Couceiro M. Mehdi Kaytoue M. Amedeo Napoli A. Diana Cristea D. Florence Le Ber F. Rokia Missaoui R. Léonard Kwuida L. Barış Sertkaya B. June 2019 https://hal.inria.fr/hal-02166713 ICFCA 2019 - 15th International Conference on Formal Concept Analysis Formal Concept Analysis for Identifying Biclusters with Coherent Sign Changes Nyoman Juniarta N. Miguel Couceiro M. Amedeo Napoli A. July 2019 https://hal.inria.fr/hal-02181600 working paper or preprint Discovering and Comparing Relational Knowledge, the Example of Pharmacogenomics Pierre Monnin P. January 2019 https://hal.inria.fr/hal-01955424 Article in Proceedings of the EKAW Doctoral Consortium 2018 co-located with the 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018) Fractional chromatic number, maximum degree and girth François Pirot F. Jean-Sébastien Sereni J.-S. November 2019 https://hal.archives-ouvertes.fr/hal-02096426 working paper or preprint Mining Text Data Charu C. Aggarwal C. C. ChengXiang Zhai C. Springer 2012 Mining Definitions from RDF Annotations Using Formal Concept Analysis Mehwish Alam M. Aleksey Buzmakov A. Victor Codocedo V. Amedeo Napoli A. International Joint Conference in Artificial Intelligence Buenos Aires, Argentina Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence July 2015 https://hal.archives-ouvertes.fr/hal-01186204 Exploratory Knowledge Discovery over Web of Data Mehwish Alam M. Aleksey Buzmakov A. Amedeo Napoli A. Discrete Applied Mathematics 2017 1-25 https://hal.inria.fr/hal-01673439 LatViz: A New Practical Tool for Performing Interactive Exploration over Concept Lattices Mehwish Alam M. Thi Nhu Nguyen Le T. N. N. Amedeo Napoli A. CLA 2016 - Thirteenth International Conference on Concept Lattices and Their Applications Moscow, Russia July 2016 https://hal.inria.fr/hal-01420751 Data interlinking through robust linkkey extraction Manuel Atencia M. Jérôme David J. Jérôme Euzenat J. Torsten Schaub T. Gerhard Friedrich G. Barry O'Sullivan B. Proceedings of the 21st European Conference on Artificial Intelligence (ECAI) IOS Press 2014 15–20 ftp://ftp.inrialpes.fr/pub/exmo/publications/atencia2014b.pdf The Description Logic Handbook F. Baader F. D. Calvanese D. D. McGuinness D. D. Nardi D. P. Patel-Schneider P. Cambridge University Press

Cambridge, UK

2003 Characterizing Approximate-Matching Dependencies in Formal Concept Analysis with Pattern Structures Jaume Baixeries J. Victor Codocedo V. Mehdi Kaytoue M. Amedeo Napoli A. Discrete Applied Mathematics 249 2018 18-27 https://hal.inria.fr/hal-01673441 Monotonic classification: An overview on algorithms, performance measures and data sets J.R. Cano J. P.A. Gutiérrez P. B. Krawczyk B. M. Wozniak M. S. García S. Neurocomputing 341 2019 168–182 Structures de haies dans un paysage agricole : une étude par chemin de Hilbert adaptatif et chaînes de Markov Sébastien Da Silva S. Florence Le Ber F. Claire Lavigne C. EGC 2016 – 16èemes Journées Francophones ”Extraction et Gestion des Connaissances” Reims, France Revue des Nouvelles Technologies de l'Information RNTI-E-30 January 2016 279–290 https://hal.archives-ouvertes.fr/hal-01266344 Pattern Structures and Their Projections Bernhard Ganter B. Sergei O. Kuznetsov S. O. Proceedings of ICCS 2001 LNCS 2120 Springer 2001 129–142 Aggregation Functions Encyclopedia of Mathematics and its Applications Michel Grabisch M. Jean-Luc Marichal J.-L. Radko Mesiar R. Endre Pap E. Cambridge University Press 2009 Consensus Theories. An oriented survey Olivier Hudry O. Bernard Monjardet B. Mathématiques et Sciences Humaines 190 2 2010 139–167 Sequence Mining within Formal Concept Analysis for Analyzing Visitor Trajectories Nyoman Juniarta N. Miguel Couceiro M. Amedeo Napoli A. Chedy Raïssi C. SMAP 2018 - 13th International Workshop on Semantic and Social Media Adaptation and Personalization Zaragoza, Spain September 2018 https://hal.inria.fr/hal-01887927 Sequential Pattern Mining using FCA and Pattern Structures for Analyzing Visitor Trajectories in a Museum Nyoman Juniarta N. Miguel Couceiro M. Amedeo Napoli A. Chedy Raïssi C. CLA 2018 - The 14th International Conference on Concept Lattices and Their Applications Olomouc, Czech Republic June 2018 https://hal.inria.fr/hal-01887914 Mining gene expression data with pattern structures in formal concept analysis Mehdi Kaytoue M. Sergei O. Kuznetsov S. O. Amedeo Napoli A. Sébastien Duplessis S. Information Sciences 181 10 2011 1989-2001 https://hal.archives-ouvertes.fr/hal-00541100 Time Space Simulation of Land Use changes by stochastic modeling Jean-François Mari J.-F. Arnaud Gobillot A. Marc Benoît M. Revue Internationale de Géomatique 28 2 August 2018 219–242 https://hal.inria.fr/hal-01662140 Discovering structural alerts for mutagenicity using stable emerging molecular patterns Jean-Philippe Metivier J.-P. Alban Lepailleur A. Aleksey Buzmakov A. Guillaume Poezevara G. Bruno Crémilleux B. Sergei O. Kuznetsov S. O. Jérémie Le Goff J. Amedeo Napoli A. Ronan Bureau R. Bertrand Cuissart B. Journal of Chemical Information and Modeling 55 5 2015 925–940 https://hal.archives-ouvertes.fr/hal-01186716 Cooperation of bio-ontologies for the classification of genetic intellectual disabilities : a diseasome approach Gabin Personeni G. Marie-Dominique Devignes M.-D. Malika Smaïl-Tabbone M. Philippe Jonveaux P. Céline Bonnet C. Adrien Coulet A. Proceedings of the 11th International Conference on Semantic Web Applications and Tools for Healthcare and Life Sciences (SWAT4HCLS 2018) Antwerp, Belgium December 2018 https://hal.inria.fr/hal-01925471 Turning CARTwheels: An Alternating Algorithm for Mining Redescriptions Naren Ramakrishnan N. Deept Kumar D. Bud Mishra B. Malcolm Potts M. Richard F. Helm R. F. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining New York, NY, USA KDD '04 ACM 2004 266–275 Relational Concept Analysis: Mining Concept Lattices From Multi-Relational Data Mohamed Rouane-Hacene M. Marianne Huchard M. Amedeo Napoli A. Petko Valtchev P. Annals of Mathematics and Artificial Intelligence 67 1 January 2013 81-108 http://hal.inria.fr/lirmm-00816300