Section: New Results


We describe here some complementary work carried out this year in knowledge representation and knowledge engineering. First, we started a new collaboration with the Center for Structural Biochemistry in Montpellier, on the translation of boolean functions in a biological language, so as to help the design of biological devices satisfying formal properties. Second, in collaboration with MAREL team at LIRMM, which designs algorithms for Formal Concept Analysis, we developed a new algorithm for generating text under constraints, implemented in a tool that relies on our tool Cogui. Third, we report some complementary work carried out in the IATE team on the construction of ontologies in the agronomy domain and their use to integrate heterogeneous data, the obtained knowledge base acting as an input to a decision support system.

Encoding Boolean Functions in Biological Systems

Participants : Michel Leclère, Federico Ulliana, Guillaume Perution Kihli.

This work has been done as part of a new collaboration started in 2017 with the ”Centre de Biochimie Structurale (CBS) de Montpellier” with Sarah Guiziou and Jérôme Bonnet. CBS is interested in developing a framework dedicated to the automatic design of “recombinase” biological systems implementing a boolean function. Recombinases are genetic enzymes which allow to manipulate the structure of genomes, and to control gene expression, which is seen as the output of a boolean function. Different ways of designing such systems are possible. In this collaboration, we study the design of biological sequences of DNA that are intended to implement a specific boolean function defined by the expert biologist. From our side, we study the logical expressivity of such systems. Concretely, our goal is to characterize the set of boolean functions that do admit a biological implementation under certain constraints. Then, whenever this is possible, devise a method for automatically constructing such sequence.

This first year, we have studied and highlighted some characteristic properties of biological sequences, namely equivalence, irreducibility, and simplifiability. We also develop an algorithm to exhaustively explore the set of irreducible and not simplifiable sequences with n inputs (which allows us to implement a boolean function with n variables). This algorithm has been implemented in a distributed way and run on a high performance cluster. From its outputs, we built a database allowing to associate the different possible sequences to each boolean function up to 4 variables (http://genetix.lirmm.fr).

  • Our first findings are contained in a preliminary report [42]

Text Generation Under Constraints on top of Cogui

Participants : Michel Chein, Alain Gutierrez.

We built a tool that can be used for building, editing, and reusing, large corpuses for text generation under constraints. Text generation is made by dynamically instantiating templates with terms that are drawn from a collection of available textual corpuses. We developed a database indexing technique based on a sub-order of a Galois lattice (so-called AOC-poset) that we use to describe the structure of the input texts as well as the terms that they contain. Thanks to the index we can efficiently find terms for the text generations process. The final tool is developed on top of Cogui. Finally, we conducted an experimental evaluation that outlines the size and construction time of indexes (which are built off-line), as well as the performance of text-generation.

  • Our results have been published in ISMIS 2017 [27]

Complementary Work on Ontologies for Data Integration in Agronomy

Participant : Patrice Buche.

We use here ontologies to integrate experimental data across complementary sub-domains in agronomy. Scientific literature in the agronomy field is growing fast and could be a valuable source of data for researchers willing to address extended research questions, for example, comparing the efficiency of the same biomass treatment applied in different contexts. However, scientific data is abundant, mostly in textual format, and heterogeneously structured, all factors that can hinder its systematic reuse. We put an effort on the implementation of decision support systems using ontologies and structured knowledge to integrate scientific data coming from different sources. This led to the definition of a new ontology network called Agri-Food Experiment Ontology (AFEO), which was developed based on two ontological resources AEO (Ontology for Agricultural Experiments) and OFPE (Ontology for Food Processing Experiments) and of a termino-ontological resource to compare ligno-cellulosic biomass and agro-waste valorisation routes. We studied methods for linking existing ontologies in life sciences and environment. To extract knowledge from data, we also devised an automatically discovery and extraction method for relevant data modeled as n-ary relations in plain text.

  • Results were published in CEAR  [17], WCCA 2017  [26], ESA  [14], and AKDM  [39]

Heterogeneous data integrated thanks to ontology networks are reused in Decision support systems (DSS). Two prototypes have been implemented in the domain of food packaging selection for respiring and non respiring fresh foods. Additionnaly, our team contibutes to international initiatives to suggest ontological standards in sub-domains of Agriculture.

  • Results were published in F1000Research [18], Innovations Agronomiques [19], and Packaging Research [16]