SEMAGRAMME

SEMAGRAMME - 2023

2023Activity reportProject-TeamSEMAGRAMME

RNSR: 201120979K

Research center Inria Centre at Université de Lorraine
In partnership with:CNRS, Université de Lorraine
Team name: Semantic Analysis of Natural Language
In collaboration with:Laboratoire lorrain de recherche en informatique et ses applications (LORIA)
Domain:Perception, Cognition and Interaction
Theme:Language, Speech and Audio

Keywords

Computer Science and Digital Science

A5.8. Natural language processing
A7.2. Logic in Computer Science
A9.4. Natural language processing

1 Team members, visitors, external collaborators

Research Scientists

Philippe de Groote [Team leader, INRIA, Senior Researcher]
Bruno Guillaume [INRIA, Researcher]
Sylvain Pogodalla [INRIA, Researcher]

Faculty Members

Maxime Amblard [UL, Professor, HDR]
Karën Fort [SORBONNE UNIVERSITE, Associate Professor, UL appointed, HDR]
Jacques Jayez [ENS DE LYON, Emeritus, UL appointed]
Michel Musiol [UL, Professor Delegation, from Sep 2023, HDR]
Guy Perrier [UL, Emeritus, until Jun 2023, HDR]

Post-Doctoral Fellow

Marc Anderson [UL, Post-Doctoral Fellow]

PhD Students

Samuel Buchel [UL, until Sep 2023]
Hee-Soo Choi [UL]
Marie Cousin [UL]
Amandine Decker [UL, University of Gothenburg]
Fanny Ducel [UNIV PARIS SACLAY, from Oct 2023]
Maxime Guillaume [Yseop, CIFRE]
Amandine Lecomte [INRIA]
Chuyuan Li [UL, ATER, until Aug 2023]
Pierre Ludmann [Polytech Nancy (PRCE), until Sep 2023]
Siyana Pavlova [UL, ATER, from Sep 2023]
Siyana Pavlova [UL, until Aug 2023]
Valentin Richard [UL]
Priyansh Trivedi [INRIA, until May 2023]

Technical Staff

Khensa Amani Daoudi [INRIA, Engineer, from Feb 2023]
Bertrand Remy [UL, Engineer, from Feb 2023, ANR Codeine ANR-20-CE23-0026)]
Vincent Tourneur [INRIA, Engineer]

Interns and Apprentices

Karolina Boczon [UL, Intern, from Feb 2023 until Aug 2023]
Omar Cherif [INRIA, Intern, from Mar 2023 until Sep 2023]
Fanny Ducel [UL, Intern, from Mar 2023 until Sep 2023]
Julie Halbout [INRIA, Intern, from Feb 2023 until Jul 2023]
Laura Masson-Grehaigne [UL, Intern, from May 2023 until Jul 2023]
Violette Pelgrims [UL, Intern, until Aug 2023]
Jules Steelandt [UL, Intern, from Feb 2023 until Apr 2023, ANR Codeine ANR-20-CE23-0026]

Administrative Assistants

Angelique Marc [INRIA, from Sep 2023]
Anne-Marie Messaoudi [UL]

External Collaborators

Mathieu Constant [UL, HDR]
Chuyuan Li [British Columbia University, from Sep 2023]
Michel Musiol [UL, until Aug 2023, HDR]

2 Overall objectives

2.1 Scientific Context

Computational linguistics is a discipline at the intersection of computer science and linguistics. On the theoretical side, it aims to provide computational models of the human language faculty. On the applied side, it is concerned with natural language processing and its practical applications.

From a structural point of view, linguistics is traditionally organized into the following sub-fields:

Phonology, the study of language abstract sound systems.
Morphology, the study of word structure.
Syntax, the study of language structure, i.e., the way words combine into grammatical phrases and sentences.
Semantics, the study of meaning at the levels of words, phrases, and sentences.
Pragmatics, the study of the ways in which the meaning of an utterance is affected by its context.

Computational linguistics is concerned by all these fields. Consequently, various computational models, whose application domains range from phonology to pragmatics, have been developed. Among these, logic-based models play an important part, especially at the “highest” levels.

At the level of syntax, generative grammars may be seen as basic inference systems, while categorial grammars are based on substructural logics specified by Gentzen sequent calculi. Finally, model-theoretic grammars amount to sets of logical constraints to be satisfied.

At the level of semantics, the most common approaches derive from Montague grammars, which are based on the simply typed $λ$ -calculus and Church's simple theory of types. In addition, various logics (modal, hybrid, intensional, higher order...) are used to express logical semantic representations.

At the level of pragmatics, the situation is less clear. The word pragmatics has been introduced by Morris to designate the branch of philosophy of language that studies, besides linguistic signs, their relation to their users and the possible contexts of use. The definition of pragmatics was not quite precise, and, for a long time, several authors have considered (and some authors are still considering) pragmatics as the wastebasket of syntax and semantics. Nevertheless, as far as discourse processing is concerned (which includes pragmatic problems such as pronominal anaphora resolution), logic-based approaches have also been successful. In particular, Kamp's Discourse Representation Theory gave rise to sophisticated `dynamic' logics. The situation, however, is less satisfactory than it is at the semantic level. On the one hand, we are facing a kind of logical “tower of Babel”. The various pragmatic logic-based models that have been developed, while sharing underlying mathematical concepts, differ in several respects and are too often based on ad hoc features. As a consequence, they are difficult to compare and appear more as competitors than as collaborative theories that could be integrated. On the other hand, several phenomena related to discourse dynamics (e.g., context updating, presupposition projection and accommodation, contextual reference resolution...) are still lacking deep logical explanations. We strongly believe, however, that this situation can be improved by applying to pragmatics the same approach Montague applied to semantics, using the standard tools of mathematical logic.

Accordingly:

The overall objective of the Sémagramme project is to design and develop new unifying logic-based models, methods, and tools for the semantic analysis of natural language utterances and discourses. This includes the logical modeling of pragmatic phenomena related to discourse dynamics. Typically, these models and methods will be based on standard logical concepts (stemming from formal language theory, mathematical logic, and type theory), which should make them easy to integrate.

The project is organized along three research directions (i.e., syntax-semantics interface, discourse dynamics, and common basic resources), which interact as explained below.

Moreover, a transversal and transdisciplinary theme has been developed in the team in the past years: ethics in NLP and more generally in AI.

2.2 Syntax-Semantics Interface

The Sémagramme project intends to focus on the semantics of natural languages (in a wider sense than usual, including some pragmatics). Nevertheless, the semantic construction process is syntactically guided, that is, the constructions of logical representations of meaning are based on the analysis of the syntactic structures. We do not want, however, to commit ourselves to such or such specific theory of syntax. Consequently, our approach should be based on an abstract generic model of the syntax-semantic interface.

Here, an important idea of Montague comes into play, namely, the “homomorphism requirement”: semantics must appear as a homomorphic image of syntax. While this idea is almost a truism in the context of mathematical logic, it remains challenged in the context of natural languages. Nevertheless, Montague's idea has been quite fruitful, especially in the field of categorial grammars, where van Benthem showed how syntax and semantics could be connected using the Curry-Howard isomorphism. This correspondence is the keystone of the syntax-semantics interface of modern type-logical grammars. It also motivated the definition of our own Abstract Categorial Grammars 57.

Technically, an Abstract Categorial Grammar simply consists of a (linear) homomorphism between two higher-order signatures. Extensive studies have shown that this simple model allows several grammatical formalisms to be expressed, providing them with a syntax-semantics interface for free 55, 7.

We intend to carry on with the development of the Abstract Categorial Grammar framework. At the foundational level, we will define and study possible type theoretic extensions of the formalism, in order to increase its expressive power and its flexibility. At the implementation level, we will continue the development of an Abstract Categorial Grammar support system.

As said above, considering the syntax-semantics interface as the starting point of our investigations allows us not to be committed to some specific syntactic theory. The Montagovian syntax-semantics interface, however, cannot be considered to be universal. In particular, it does not seem to be well adapted to dependency and model-theoretic grammars. Consequently, in order to be as generic as possible, we intend to explore alternative models of the syntax-semantics interface. In particular, we will explore relational models where several distinct semantic representations can correspond to the same syntactic structure.

2.3 Discourse Dynamics

It is well known that the interpretation of a discourse is a dynamic process. Take a sentence occurring in a discourse. On the one hand, it must be interpreted according to its context. On the other hand, its interpretation affects this context, and must therefore result in an updating of the current context. For this reason, discourse interpretation is traditionally considered to belong to pragmatics. The cut between pragmatics and semantics, however, is not that clear.

As we mentioned above, we intend to apply to some aspects of pragmatics (mainly, discourse dynamics) the same methodological tools Montague applied to semantics. The challenge here is to obtain a completely compositional theory of discourse interpretation, by respecting Montague's homomorphism requirement. We think that this is possible by using techniques coming from programming language theory, in particular, continuation semantics, and the related theories of functional control operators.

We have indeed successfully applied such techniques in order to model the way quantifiers in natural languages may dynamically extend their scope 56. We intend to tackle, in a similar way, other dynamic phenomena (typically, anaphora and referential expressions, presupposition, modal subordination...).

What characterizes these different dynamic phenomena is that their interpretations need information to be retrieved from a current context. This raises the question of the modeling of the context itself. At a foundational level, we have to answer questions such as the following. What is the nature of the information to be stored in the context? What are the processes that allow implicit information to be inferred from the context? What are the primitives that allow a context to be updated? How does the structure of the discourse and the discourse relations affect the structure of the context? These questions also raise implementation issues. What are the appropriate data types? How can we keep the complexity of the inference algorithms sufficiently low?

2.4 Common Basic Resources

Even if our research primarily focuses on semantics and pragmatics, we nevertheless need syntax. More precisely, we need syntactic trees to start with. We consequently need grammars, lexicons, and parsing algorithms to produce such trees. During the last years, we have developed the notion of interaction grammar 58 and graph rewriting 3, 4 as models of natural language syntax. This includes the development of grammars for French 67, together with morphosyntactic lexicons. We intend to continue this line of research and development. In particular, we want to increase the coverage of our grammars for French, and provide our parsers with more robust algorithms.

Further primary resources are needed in order to put at work a computational semantic analysis of utterances and discourses. As we want our approach to be as compositional as possible, we must develop lexicons annotated with semantic information. This opens the quite wide research area of lexical semantics.

Finally, when dealing with logical representations of utterance interpretations, the need for inference facilities is ubiquitous. Inference is needed in the course of the interpretation process, but also to exploit the result of the interpretation. Indeed, an advantage of using formal logic for semantic representations is the possibility of using logical inference to derive new information. From a computational point of view, however, logical inference may be highly complex. Consequently, we need to investigate which logical fragments can be used efficiently for natural language oriented inference.

3 Research program

3.1 Overview

The research program of Sémagramme aims to develop models based on well-established mathematics. We seek two main advantages from this approach. On the one hand, by relying on mature theories, we have at our disposal sets of mathematical tools that we can use to study our models. On the other hand, developing various models on a common mathematical background will make them easier to integrate, and will ease the search for unifying principles.

The main mathematical domains on which we rely are formal language theory, symbolic logic, and type theory.

3.2 Formal Language Theory

Formal language theory studies the purely syntactic and combinatorial aspects of languages, seen as sets of strings (or possibly trees or graphs). Formal language theory has been especially fruitful for the development of parsing algorithms for context-free languages. We use it, in a similar way, to develop parsing algorithms for formalisms that go beyond context-freeness. Language theory also appears to be very useful in formally studying the expressive power and the complexity of the models we develop.

3.3 Symbolic Logic

Symbolic logic (and, more particularly, proof theory) is concerned with the study of the expressive and deductive power of formal systems. In a rule-based approach to computational linguistics, the use of symbolic logic is ubiquitous. As we previously said, at the level of syntax, several kinds of grammars (generative, categorial...) may be seen as basic deductive systems. At the level of semantics, the meaning of an utterance is captured by computing (intermediate) semantic representations that are expressed as logical forms. Finally, using symbolic logics allows one to formalize notions of inference and entailment that are needed at the level of pragmatics.

3.4 Type Theory and Typed Lambda-Calculus

Among the various possible logics that may be used, Church's simply typed $λ$ -calculus and simple theory of types (also known as higher-order logic) play a central part. On the one hand, Montague semantics is based on the simply typed $λ$ -calculus, and so is our syntax-semantics interface model. On the other hand, as shown by Gallin, the target logic used by Montague for expressing meanings (i.e., his intensional logic) is essentially a variant of higher-order logic featuring three atomic types (the third atomic type standing for the set of possible worlds).

4 Application domains

4.1 Deep Semantic Analysis

Our applicative domains concern natural language processing applications that rely on a deep semantic analysis. For instance, one may cite the following ones:

textual entailment and inference,
dialogue systems,
semantic-oriented query systems,
content analysis of unstructured documents,
(semi) automatic knowledge acquisition,
discourse structure analysis (argumentative relations, discourse markers),
lexical resources.

4.2 Text Transformation

Text transformation is an application domain featuring two important sub-fields of computational linguistics:

parsing, from surface form to abstract representation,
generation, from abstract representation to surface form.

Text simplification or automatic summarization belong to that domain.

We aim at using the framework of Abstract Categorial Grammars we develop to this end. It is indeed a reversible framework that allows both parsing and generation. Its underlying mathematical structure of $λ$ -calculus makes it fit with our type-theoretic approach to discourse dynamics modeling.

4.3 Types for discourse markers

While there is a rich descriptive literature on Discourse Markers (DM), for instance words/expressions like so or yet in English, the question of their representation in type systems is understudied. In addition to basic types such as individuals or events, or simple functional types (properties, etc.), DM are known to operate on domains like states of affairs, beliefs or speech acts. The entities inhabiting these domains are themselves complex. For instance, speech acts involve discourse planning in the form of a network of intentions and actions. Moreover, DM can combine with one another, forming clusters whose meaning is not always apparent from the meanings of the component DM. Within the context of the ANR CODIM, we aim at developing a typing system for (i) taking into account the array of types denoted by DM and (ii) addressing the questions of the semantic nature of their combinations.

5 Social and environmental responsibility

5.1 Footprint of research activities

ANR InExtenso:

WP4 of the project is dedicated to the evaluation of the environmental impact of the LLMs. More precisely, it aims at proposing a method for measuring the environmental impact of digital health and use it in the project evaluations and beyond.

6 Highlights of the year

Sémagramme organized the 15th International Conference of Computational Semantics (IWCS), the annual conference of the special interest group on computational semantics of the ACL, Nancy 20-23 06 2023.

7 New software, platforms, open data

7.1 New software

7.1.1 ACGtk

Name:
Abstract Categorial Grammar Development Toolkit
Keywords:
Natural language processing, NLP, Syntactic analysis, Semantics
Scientific Description:

Abstract Categorial Grammars (ACG) are a grammatical formalism in which grammars are based on typed lambda-calculus. A grammar generates two languages: the abstract language (the language of parse structures), and the object language (the language of the surface forms, e.g., strings, or higher-order logical formulas), which is the realization of the abstract language.

ACGtk provides two software tools to develop and to use ACGs: acgc, which is a grammar compiler, and acg, which is an interpreter of a command language that allows one, in particular, to parse and realize terms.
Functional Description:
ACGtk provides a piece of software for developing and using Abstract Categorial Grammars (ACG).
Release Contributions:
New version of the software that provides a new command language and new functionalities for the interpreter (completion, sorting of parsing structures, use of magic set rewriting technic for parsing). The compiler now accepts UTF-8, and a new syntax for identifiers, binders and operators was defined.
URL:
https://gitlab.inria.fr/ACG/dev/ACGtk
Publications:
hal-01242154, hal-01328702, tel-01412765, inria-00112956, inria-00100529
Contact:
Sylvain Pogodalla
Participants:
Philippe De Groote, Pierre Ludmann, Jiri Marsik, Sylvain Pogodalla, Vincent Tourneur

7.1.2 Grew

Name:
Graph Rewriting
Keywords:
Semantics, Syntactic analysis, NLP, Graph rewriting
Functional Description:
Grew is a Graph Rewriting tool dedicated to applications in NLP. Grew takes into account confluent and non-confluent graph rewriting and it includes several mechanisms that help to use graph rewriting in the context of NLP applications (built-in notion of feature structures, parametrization of rules with lexical information).
News of the Year:
In 2023, 4 new versions were released. The main new features are: non-injective matching, a new "with" keyword for positive filtering, a syntax extension to express disjunction on node matching.
URL:
https://grew.fr/
Publications:
hal-03724068, hal-03177701, hal-01930591, hal-01814386, hal-03021720, hal-04387830, hal-04387852, hal-03724068, hal-03724129, hal-03846825
Contact:
Bruno Guillaume
Participants:
Bruno Guillaume, Guy Perrier, Guillaume Bonfante

7.1.3 SLODiM

Name:
SLODiM
Keywords:
NLP, Discourse, Dialogue, French
Functional Description:
SLODiM is a software package for the analysis of oral French. It is more particularly developed to allow the analysis of interviews with clinicians in order to identify language behaviours characteristic of mental pathologies.
Release Contributions:

The latest version integrates new treatments in particular at the level of the identification of the backchannels.

A version without the graphical representations is available without an account. Its purpose is to make visible the treatments that are produced in the system.
News of the Year:
Implementation of new analyses, facilitation of automatic use on various corpora of spontaneous speech.
URL:
https://academics.slodim.loria.fr/
Contact:
Maxime Amblard
Partners:
Loria, Université de Lorraine, CNRS

7.1.4 HostoMytho

Keywords:
Semantic annotation, Annotation tool, Semantic, Medical applications, NLP
Functional Description:
HostoMytho is a GWAP, or "game with a purpose" developed within the framework of the CODEINE ANR project. The aim of the game is to allow users to annotate medical files generated automatically, in order to evaluate their plausibility (quality of the language and medical semantics) and to add different layers of information (negation, hypothesis, time, etc.). HostoMytho is multiplatform.
Contact:
Bertrand Remy
Partners:
LISN, CEA-List

7.1.5 Arborator-Grew

Name:
Arborator's Collaborative Annotation
Keywords:
Annotation tool, Syntactic analysis
Functional Description:
The online interface allows managing collaborative annotation projects in dependency syntax. It is possible to use Grew queries and also to directly rewrite graphs in the annotation tool.
News of the Year:
Different features have been developed during 2023: redefinition of different types of user roles and access, integration of a labeling system in order to organize the annotation workflow, implementation of a full synchronization between Github and ArboratorGrew, improvement of different existing features (such as Lexicon, Grew search and Grew rule edition), enhancement of the parser UI, fixes for bugs and issues reported by the users, development of user documentation.
URL:
https://arborator.github.io/
Publication:
hal-03021720
Contact:
Bruno Guillaume
Participants:
Khensa Amani Daoudi, Bruno Guillaume, Gael Guibon, Kim Gerdes, Kirian Guiller
Partners:
Université Paris Nanterre, LIMSI, LISN

7.2 Open data

7.2.1 Morphosyntactic Treebanks

Participants: Bruno Guillaume, Valentin Richard, Guy Perrier.

Sémagramme actively participates in the maintenance and the development of morphosyntactic treebanks in the Surface Syntactic Universal Dependencies (SUD) and Universal Dependencies (UD) projects.

Several French treebanks are involved:

A treebank of French sentences containing interrogative clauses, called French Interrogative Bank (UD_French-FIB) was extracted from French UD treebanks using the program FUDIA, designed for this purpose.

In 2023, by collaborating with linguists and field linguists in the Autogramm project, Sémagramme is also helping to develop treebanks for new languages (Haitian Creole and Hausa) and to improve and extend existing treebanks (Beja, Naija and Zaar).

In Haitian Creole: SUD_Haitian_Creole-Autogramm
In Hausa: SUD_Hausa-Autogramm
In Beja: SUD_Beja-NSC
In Naija: SUD_Naija-NSC
In Zaar: SUD_Zaar-Autogramm

8 New results

8.1 Syntax-Semantics Interface

Participants: Maxime Amblard, Marie Cousin, Philippe de Groote, Bruno Guillaume, Maxime Guillaume, Pierre Ludmann, Sylvain Pogodalla, Siyana Pavlova, Valentin Richard, Priyansh Trivedi.

8.1.1 Abstract Categorial Grammars

Feature Structure

ACG has proven to be a powerful framework with well-defined theoretical properties. It was, however, lacking a facility which is useful and widely used for grammar engineering: feature structures. The latter are often used to express in a concise way some combinatorial properties related to morphosyntactic properties of expressions, for instance subject-verb agreement.

We worked on extending the ACG type system to provide a generic feature structure framework. This extension relies on a restricted addition of the product (records) and dependent types. We also considered the reduction of grammars using this extension to Datalog programs (which is used to implement ACG parsing in ACGtk, see Sec. 7).

An experiment with the actual Yseop proprietary grammars and ACG system with features was run and resulted in a significant improvement, both in the size of the grammar (decrease) and the efficiency of text generation (increase) 37.

Multityped ACG (mACG) and Weighted ACG

Symbolic parsing with large coverage grammars usually leads to combinatorial explosion of syntactic ambiguities (a single expression has many syntactic analyses). A widespread method to tackle this issue is to use statistics and probabilities, leading for instance to probabilistic Context Free Grammars (pCFGs) and probabilistic Tree Adjoining Grammars (pTAGs). An important goal is then to also extend ACGs with probabilities or weights.

Yet, ACGs come with features that make this extension non-trivial. In particular, ACGs can be composed by making the parse structures of a grammar the surface structures of another ACG. The resulting composition is a full-flavored ACG. Because adding weights to ACGs naturally leads to refining admissible abstract structures (associated with a weight), ACG composition does not anymore correspond to functional composition. We introduced multityped ACGs (mACGs) 61 to support weighting extension, showing that a suitable notion of composition can be defined for multityped ACGs as well. We extended this work also by introducing weighted ACG and showing that composition still holds and that weighted ACGs can encode hidden Markov models (not yet published).

Encoding of Meaning-Text Theory Into ACGs

Meaning-Text Theory (MTT) is a linguistic theory geared towards generating natural language expressions from semantic representations 62. It relies on seven representation levels (e.g., semantics, deep syntax, surface syntax, etc.). Structures at each level are related to structures at the adjacent levels by rewriting devices. MTT use the key concept of paraphrase, especially in these rewriting devices. ACGs come with several composition modes, one of which in particular corresponds to transduction of (tree or graph) structures. We used this ability to study the extent to which MTT architecture as well as the paraphrase operation can be encoded into ACGs. We showed that, while some of the MTT mechanisms as well as some linguistic phenomena can be faithfully accounted for within ACGs, some other ones require additional mechanisms 28, 16.

8.1.2 Formal semantics of dependency relations

We have laid the foundations of a formal semantic theory for dependency grammars 42 by exploiting the following principles:

Dependency relations are represented as binary functions of type $α \to β \to α$ (or $β \to α \to α$ ), where $α$ is the syntactic category of the governor and $β$ is the syntactic category of the governee.
By virtue of a coherence principle, the different ways of encoding a dependency structure by means of a $λ$ -term should all give rise to the same semantic interpretation.
The semantic interpretation of a syntactic category cat is a type of the form $β_{1} \to \dots β_{n} \to α$ , where $α$ is the Montagovian interpretation of cat and $β_{1}, ... β_{n}$ are the Montagovian interpretations of the syntactic categories of the phrases whose heads can potentially be governed by the head of a phrase of category cat.
Saturating operators allow phrases to recover their usual Montagovian interpretations.
Verbs and verbal phrases are semantically interpreted as sets of sets of events.

The resulting theory is fully compositional, allows for a treatment of quantification, and is robust in the sense that it supports the interpretation of partial dependency structures. We have then demonstrated that our theory allows one to deal with more advanced syntactic phenomena 38. We considered two cases: the relative clauses (which depend on the acl:relcl dependency relation) and the open clausal complements (which depend on the xcomp dependency relation).

8.1.3 Lexical Semantics and Linguistic Knowledge

Discourse Markers are lexical items which, like so or well in English, are used to organize discourse or to manifest epistemic or affective states of the speaker. Their semantic contribution is difficult to characterize in a sufficiently general way, in particular because they denote entities with a rich semantic structure, such as beliefs or speech acts. Preliminary results suggest that it is necessary to view their semantic update potential as involving intentions, actions and, generally speaking, dynamic modalities. The question then arises of what formal languages/systems are the most appropriate to capture these features (modal logic? Belief-Desire-Intention systems, interactive semantics à la Ginzburg, monads to mimic side effects, etc.).

8.1.4 Semantic Representation

We have been studying and comparing different existing semantic representation formalisms (Abstract Meaning Representation (AMR), Discourse Representation Theory (DRT), Uniform Meaning Representation (UMR), to name but a few). The aim is to determine, through theoretical and empirical studies, whether these formalisms are compatible and if they encode the same level of semantic information. This gives us a better understanding to open a discussion on what we want semantic representation to be, and build the groundwork for a unifying formalism.

In 24, an empirical comparison of two of the most popular semantic representation formalisms, AMR and DRS, is presented. Its goal is to explore the viability of transforming graphs from one framework into another to construct parallel datasets, but also to obtain an understanding of the similarities and differences through the use of annotated data. As no freely available parallel datasets that are common between the two exist, a sample of 200 sentences already annotated in DRS was taken and also annotated in AMR. A corpus-based approach was then used to build a graph rewriting system from DRS to AMR. This gives insights into where the two formalisms differ in encoding various semantic phenomena.

In 25, a set of structural (encoding of negation, temporal information, modality, etc.) and global (universality, scalability, etc.) features to consider when designing formalisms, and against which formalisms can be assessed, is proposed. A comparison of eight semantic representation formalisms, some that are graph-based and some that are logic-based, across the proposed features is provided, complemented by a comparison across the features of a more entailment-oriented corpus (FraCaS).

In 32, a first version of a layered approach to semantic representation is presented. It is motivated by the need for a semantic representation formalism which can encode a rich variety of semantic phenomena, while remaining simple to annotate and easy to read. The representation derives its core predicate-argument structure from Abstract Meaning Representation (AMR). The structure is then extended to encode "features", where each feature is considered as a layer representing a semantic phenomenon that is to be encoded. This approach allows for the possibility to exclude layers from the representation easily, while still being able to represent phenomena such as scope that are difficult for most graph-oriented formalisms.

8.1.5 Semantics of questions

Natural language statements are composed not only of declarative sentences but also of interrogative ones. Moreover, sentences cannot be categorized into purely declarative or purely interrogative sentences. Typically, a declarative statement may contain an indirect interrogative clause:

(a)
I don't know where Mary is.

Conversely, a direct interrogative clause may contain a declarative subordinate:

(b)
Do you know that Mary is here?

This interaction between declarative and interrogative clauses is particularly present in dialogues, where the logical notion of answerhood is as significant as the one of inference.

In order to tackle this issue from a formal standpoint, we investigated the properties and possible uses of inquisitive semantics, which is a formal semantic theory based on a logic that provides a uniform treatment of both declarative and interrogative expressions. Valentin Richard is currently working on a semantic model for the presuppositions and dynamic referential effects of interrogative pronouns. Some further investigation is required to understand the interaction between these presuppositions and possibility modals.

8.2 Discourse Dynamics

Participants: Maxime Amblard, Philippe de Groote, Jacques Jayez, Chuyuan Li, Michel Musiol.

8.2.1 Dialogue Modeling

Together with Chloé Braud (IRIT), Maxime Amblard and Chuyuan Li continued to work on modeling discourse. Discourse processing suffers from data sparsity, especially for dialogues. As a result, they have explored approaches to build discourse structures for dialogues, based on attention matrices from pretrained language models (PLMs). They have investigated multiple tasks for fine-tuning and show that the dialogue-tailored Sentence Ordering task performs best. To locate and exploit discourse information in PLMs, they have proposed an unsupervised and a semi-supervised method. The proposals thereby achieve encouraging results on the STAC corpus, with F1 scores of 57.2 and 59.3 for the unsupervised and semisupervised methods, respectively. When restricted to projective trees, the scores improved to 63.3 and 68.1, see 23.

They also start working on the design of a full discourse parser 22. Discourse analysis plays a crucial role in Natural Language Processing (NLP) and has demonstrated its usefulness in various downstream applications like summarization and question answering. In this work, they studied discourse in dialogues: an under-explored setting due to significant data scarcity challenge. They conducted discourse parsing within a pipeline: first predicting the discourse structure, and then identifying the relations within the structure. Using only 50 examples as gold training data, the methods achieve competitive results compared to supervised state-of-the-art in-domain and much stronger performance cross-domain, with also better stability.

Maxime Amblard and Amandine Decker continue to work on topic shift modeling 17. Topics play an important role in dialogue coherence, as what is currently discussed constrains the possible contributions of the participants, and initiating a topic while the previous one is still under discussion may be confusing without appropriate signals. However, the way to actually define the notion of topic is debated in linguistics and not sufficiently discussed in dialogue modeling. A precise description of topics and topic shifts in conversation would contribute to a better understanding of what makes us judge a sequence of utterances to be coherent. In order to analyze different types of topic shifts, they proposed to create a corpus of written task-oriented conversations (discussion of the ethical dilemma of the balloon task), where the dialogues happen by message exchanges. Such a controlled setting where the main topic is fixed, and subtopics are more easily identifiable, could be very helpful when it comes to understanding how people change the topic and react to topic shifts in dialogues.

8.2.2 Discourse Markers

Jacques Jayez is currently working with Mathilde Dargnat (ATILF), Paola Herreño (Ph. D. candidate ATILF-LLF) and Maeva Sillaire (Ph. D. candidate ATILF) on the semantic representation of D(iscourse) M(arkers). DM are words/expressions like so or well in English which help structuring discourse or communicating speakers' internal epistemic or affective states. The domain-based approach initiated in the 90s consists in defining different types (aka domains) of semantic objects, like states of affairs, beliefs or speech acts. Such objects have a rich internal structure, which calls for a sufficiently expressive representation. For instance, speech acts involve discourse planning. Moreover DM can impact various layers of meaning (propositional content, presupposed content, etc.). Our goal is to implement formally the domain intuition by using tools from languages with a flexible subtyping mechanism (Ocaml) and to investigate whether the notion of monad, familiar in Haskell, can help us to characterize some `side effects' of DM.

Jacques Jayez is also working on the argumentative dimension of discourse, using a combination of standard Bayesian approach and game-theoretical notions 48, 39.

8.2.3 Pathological Discourse Modeling

Michel Musiol is once again part-time delegate at Inria (délégation nationale SHS) for the period 2023-2024. This proximity has enabled us to pursue an active collaboration on the modeling of pathological and clinical discourse.

In the context of the MePheSTO project (Digital Phenotyping for Psychiatric Disorders from Social Interaction - DFKI-Inria AI project), a multimodal perspective on this issue has been introduced. This includes the development of an interlocutory model of what might be termed a "therapeutic effect" in clinical/psychopathological interviewing, with regard to the cognitive-discursive profile and oculomotor behavior of participants (Amandine Lecomte ś thesis forthcoming 2024).

Indeed, based on a previous study addressing the issue of cognitive impairment bypass based on the dynamics of the repetition process 59 and another modeling visual attention in the psychologist-schizophrenic patient interaction 66 as well as on the basis of new data (for example, from subjects with ultra-high risk of developing psychosis or subjects who have experienced a major depressive episode 12, in the MePheSTO project), Maxime Amblard, Michel Musiol and Amandine Lecomte have explored the dependent relationships between supportive visual behavior and interaction dynamics, on the one hand, and the relationships between mental pathology and oculomotor disorders, on the other hand.

In addition, in 64 they have sketched a methodology for analyzing speech disorders, which will have the particularity of helping to select the discontinuous sequences most likely to carry thought disorders. They have anticipated the development of a modeling system based on the principles of pragmatic linguistics and formal semantics, which, when applied to carefully selected discontinuous discourse sequences 63, will have a good chance of revealing the nature of the underlying thought disorders. They compared the conjectures with the results of an earlier study on the discovery of four "proven" types of discontinuous sequences, and showed which of these sequences can therefore be considered carrying thought disorders.

They also analyzed these sequences by testing certain principles of semantic modeling in order to identify the nature of the disorders and thought operations underlying the discontinuous sequences concerned 65, 64. They show that discursive thought disorders should not be considered simply as the expression of a dysexecutive syndrome, but also as a device likely to affect more complex thought operations such as inferences involved in the conversational context representation system, in semantic memory and in calculating the meaning of utterances or in calculating the meaning of the speaker.

Improving the heuristics of formal systems for recognizing speech disorders and interpreting thought disorders on the basis of more appropriate and accurate semantic modeling may lead to the development of more discriminating and effective diagnostic tools. The formatting of the formal systems they have achieved will make it possible to represent the interlocutory structure of the disorder more and more accurately in its natural context of expression (speech), and should lead to the development of computerized diagnostic aids.

Finally, the increased precision of formal modeling applied to communication disorders should also make it possible to test the hypothesis that certain discourse configurations are related to thought disorders (in the broad sense), while others reveal cognitive dysfunctions that have more to do with the conditions of the possibility of discourse.

In this line, Michel Musiol has also updated MOS-SF36 norms in the young French population 68.

This program focuses on accurate and rapid diagnosis, as well as long-term therapeutic follow-up. These are major challenges for contemporary psychopathology. Thanks to modeling 59, 60 and computing 54, this work is developing multimodal methodologies for investigating symptoms (language, speech, neuropsychological and cognitive processes, eye movements and visual attention) that are sufficiently accurate to give rise in the medium term to the development of computerized diagnostic and therapeutic follow-up tools for the benefit of those involved in mental health.

8.2.4 Cognitive traces of side issues

It is by now widely believed that natural language communication operates at several levels. This means that information is distributed across several partially independent dimensions. For instance, a sentence like My stupid neighbor made noise again, simultaneously conveys that my neighbor made noise (the truth-conditional content), that the speaker considers he is stupid (an expressive, side issue 1) and that he had made noise before (the presupposition of again, side issue 2). While these phenomena have been extensively described from an empirical perspective, there is at present no unified framework for representing their differences and possible interactions under a formal, computational or cognitive point of view.

We examined the motor effects of presuppositions, using the convenient lexical material of factive verbs, that is, verbs which presuppose the truth of the complement clause. For example, Mary knows that Paul cheated on the exam presupposes that Paul cheated on the exam (side issue) and asserts (truth-conditional content) that she believes that. It has been shown that the oral presentation of movement-related verbs like jump or push elicits some activation in the motor cortex and finally results into an involuntary contraction of the thumb-index arc, which can be recorded by a special electromagnetic cell, called a grip force sensor.

We adapted this technique to the case of factive verbs on a series of sentences of the form Mary knows that Paul throws the ball, compared with high base-level sentences like Paul throws the ball and low base-level sentences like Paul does not throw the ball. Summarizing, our results indicate that the sentences with the factive verbs elicit a very similar response to that of high base-level ones, and, as expected, a different response from that of low base-level one. This suggests that, at least for factive verbs, the presupposed status leaves no trace of a special cognitive treatment, which would lead for instance, to a delayed or weaker motor response.

However, when applied to more complex negative sentences like Mary does not know that Paul throws the ball, there is no evidence of a motor trace. This is in agreement with observations in the literature suggesting that, under negation, presuppositions are more 'difficult' to process than in simple assertive sentences. More precisely, in the case of motor response, negation interacts with the presupposition, which suggests that truth-conditional content and side issue cognitive processing cannot be totally separated 8.

8.3 Common Basic Resources

Participants: Maxime Amblard, Hee-Soo Choi, Philippe de Groote, Bruno Guillaume, Guy Perrier, Sylvain Pogodalla, Karën Fort, Valentin Richard.

8.3.1 Universal Dependencies and Surface Syntactic Universal Dependencies

The Universal Dependencies project (UD) aims at building a syntactic dependency scheme which allows for similar analyses for several different languages. Bruno Guillaume and Guy Perrier are active in the UD community, and participate in the development and the improvement of the French data in this international initiative. In 2023, two new versions (2.12 and 2.13) of the UD data were released.

During 2023, they continued working, in collaboration with Sylvain Kahane, Kim Gerdes and their teams on the promotion of the Surface Syntactic Universal Dependencies (SUD) framework. SUD is an annotation scheme for syntactic dependency treebanks, that is almost isomorphic to UD (Universal Dependencies). Contrary to UD, it is based on syntactic criteria (favoring functional heads) and the relations are defined on distributional and functional bases. In 41, they describe how function words are considered in the SUD formalism and compare with other approaches.

A suggestion to add clause type features and change the annotations of some French expressions regarding interrogatives was investigated by Valentin Richard 33.

8.3.2 Multiword annotation in the Parseme project

In 40, the new version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions is presented. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts an impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.

During the MWE workshop, we showed 20 how Grew and Grew-match can be adapted to MWE annotated data of the Parseme project. The tool can be used for linguistic exploration on the data, for helping the manual annotation process and to search for errors or inconsistencies in the annotations.

8.3.3 Induction of Descriptive Grammars

The ANR project Autogramm (Induction of descriptive grammar from annotated corpora) started in 01 2022. The goal of this project is to automate, as far as possible, the extraction of descriptive grammars and grammatical descriptions from annotated corpora for linguistic and typological studies. The project also promotes the development of treebanks for low-resourced languages, in order to extract quantitative descriptive grammars for these languages.

In 31, the project was presented to the French community. Treebanks for new languages were also released during 2023, one in Zaar and one in Haitian Creole.

8.3.4 Mapping Lexical Resources

Lexical resources are essential for the development of tools and methods for the various tasks of NLP. In French, the existing resources are heterogeneous in their size, their construction and their level of linguistic description, which opens the way to explore a method to group or link/map them automatically. French lexical resources are plentiful, but have not all been used in the same way. In 27, Hee-Soo Choi she presented a diachronic analysis of 34 French lexical resources over a 20-year period in the proceedings of the TALN conference, showing differences in the types of application and reuse of resources by the community.

8.3.5 Sentence Semantic Similarity Corpus

In the context of the CODEINE ANR project and more specifically of Nicolas Hiebel's PhD thesis, Karën Fort worked with Aurélie Névéol (LISN-CNRS) and Olivier Ferret (CEA) on the creation of a sentence semantic similarity corpus for French on the clinical domain. A paper on the subject was presented at EACL 2023 21, a rank A conference of the domain. An adaptation in French of the paper was accepted at TALN 2023 29.

The key idea of the project is to use confidential corpora to automatically generate anonymous synthetic texts capable of emulating real documents from the perspective of their linguistic characteristics. The project will rely on a Games With A Purpose and crowd sourcing to validate and then annotate these synthesized clinical texts.

This game, developed by Bertrand Remy, is called HostoMytho (see Section 7.1.4), and includes various mini-games for different annotation layers, such as negation, error typing, or plausibility rating. The game is multi-platform, and therefore intended to be used on the web, on Android and iOS.

8.4 Ethics and biases

Participants: Karën Fort, Maxime Amblard, Michel Musiol, Marc Anderson, Fanny Ducel.

8.4.1 Ethics@Loria

Karën Fort originated a working group at LORIA for AI ethics (ethics@loria), involving researchers from various teams, including Maxime Amblard, Marc Anderson, Armelle Brun (BIRD), Mathieu d'Aquin (Orpailleur), Christophe Cerisara (Synalp), Anne Bonneau (Multispeech), Slim Ouni (Multispeech) and Abdessamad Imine (Pesto). Aurore Coince helped manage the group. Ethics@loria proposed the Doctoral training on Ethics "Write your dystopia".

8.4.2 Evaluating Stereotypes in Masked Language Models in Many Languages

Following the creation of French CrowsPairs 6, Karën Fort contacted researchers interested in creating a CrowsPairs corpus for their language, in order to test the language models. The group got bigger and bigger to finally include 22 researchers (including Fanny Ducel, from the team) for 7 languages (German, Maltese, Spanish, Italian, Chinese, standard Arabic and Catalan). The group worked together for more than a year to produce corpora and test masked language models in these 7 new languages. The corpora are freely available, along with the code to test the language models and the guidelines we followed for the adaptation.1 It's important to notice that this work has been performed without any funding. The work has been detailed in a paper submitted at LREC-COLING 2024.

8.4.3 Evaluating stereotypes in autoregressive language models

Fanny Ducel authored a critical literature review on the topic of stereotypical biases in language models, under the supervision of Karën Fort and Aurélie Névéol. She presented this work at the Workshop on Algorithmic Injustice in Amsterdam in June 2023 36. This study was part of her internship and master's thesis on the evaluation of stereotypical biases in autoregressive language models, that has become her PhD topic since October 2023.

Karën Fort is PI of a new 4 year ANR project (2023-2027), InExtenso (Intrinsic and Extrinsic evaluation of biases in large language models), in collaboration with Rouen's hospital (CHU) and LISN-CNRS. The project aims at better identifying stereotyped biases in LLMs in French and, when possible, mitigate them. At LORIA, the project reaches beyond Sémagramme, as Miguel Couceiro (Orpailleur) joined us on the bias mitigation part.

8.4.4 NLP for NLP and Ethics

Karën Fort originated a research group on the impact of the BigTech companies on NLP (industry presence, potential thematic shifts, participation in paper authorship). This group is composed of Aurélie Névéol (LISN), Saif Mohammad (National Research Council Canada), Mohamed Abdalla (University of Toronto), Terry Lima Ruas and Jan Philip Wahle (Wuppertal University) and Fanny Ducel (M2 student at Sorbonne University). The group regularly gathered from 09 to 12 2022 to perform both an automatic study on all the ACL Anthology papers and a manual one from the ACL 2022 papers. The resulting paper was accepted at ACL 2023 15, the A* conference in NLP.

8.4.5 Ethics in AI Integration into Industry

Karën Fort and Marc Anderson completed the Ethics by Design component of the EU Horizons AI-Proficient project, which ended in October 2023. The final year of the project involved a review of ethics integration in the project in comparison to the EU High Level Expert Group Guidelines, as well as an analysis of the results of the implementation of ethical recommendations given to project partners. That analysis was aimed at providing a model for future projects and AI ethics researchers for operationalizing AI ethics, which remains very rare in the domain. Contributions were made to various late-stage deliverables of the project, namely those of the WP6 which dealt with project validation, and in particular the ethics team provided the AI-Proficient Deliverable 6.4: AI-Proficient Ethical Recommendations.

Marc Anderson 11 authored a journal paper in Studia Philosophica Wratislaviensia carrying out an analysis of public ethical concerns and expectations for AI as compared to the older technology of the telegraph. The latter paper was initially presented as “AI Ethics and the Lessons of History,” in the 2nd International Conference on the Ethics of Artificial Intelligence (2ICEAI)34. Marc Anderson also presented at the Sohoma 2023 conference 43, on the topic of ethical sustainability in digital manufacturing and industry 4.0 generally.

The ongoing collaboration between Karën Fort and Marc Anderson resulted in no new common article in 2023 (one has been accepted and will be published in 2024, though), but Karën Fort has been invited as a keynote at ERGO IA 13 to talk about how the recommendations made by the ethics group were taken into account in the AI-Proficient project.

9 Bilateral contracts and grants with industry

9.1 Bilateral Grants with Industry

9.1.1 Yseop

Participants: Philippe de Groote, Maxime Guillaume, Sylvain Pogodalla.

The Sémagramme team has set up a Cifre thesis contract with Yseop on ACG extensions and use in an industrial environment.

10 Partnerships and cooperations

10.1 International research visitors

10.1.1 Visits to international teams

Research stays abroad

Amandine Decker

Visited institution:
University of Gothenburg.
Country:
Sweden.
Dates:
25 March to 14 September.
Context of the visit:
Amandine Decker is working on her doctoral thesis under the joint supervision of Maxime Amblard and Ellen Breitholtz (University of Gothenburg).
Mobility program/type of mobility:
European Erasmus cooperation.

10.2 European initiatives

10.2.1 H2020 projects

AI-Proficient

Participants: Marc Anderson, Karën Fort.

Title:
Artificial intelligence for improved production efficiency, quality and maintenance
Duration:
11 2020–10 2023
Coordinator:
Benoit Iung (CRAN, Université de Lorraine)
Partners:
Université de Lorraine (coordination), Continental (industrial partner), Ineos (industrial partner), Institute Mihailo Pupin (Serbia), Tekniker (Spain), Ibermatica (Spain), TenForce (Belgium), VTT (Finland), Inos Hellas (Greece), ATC Athens Technology Center (Greece)
Participants:
Marc Anderson, Karën Fort
Abstract:
AI-Proficient carries out research on integrating AI services into the industrial contexts of three factories located in France, Belgium, and Germany. Two Sémagramme members make up the ethics team for the project since its beginning in 2020: Karën Fort (project ethics officer) and Marc Anderson (postdoctoral fellow).

10.2.2 Other european programs/initiatives

Bruno Guillaume is a member of the core group of the cost action: CA21167 - Universality, diversity and idiosyncrasy in language technology (UniDive). He is the leader of the working group named "Corpus Annotation".

10.3 National initiatives

10.3.1 ANR Project: InExtenso

Participants: Karën Fort, Maxime Amblard, Michel Musiol, Fanny Ducel.

Title:
Intrinsic and Extrinsic evaluation of biases in large language models
Duration:
10 2023–09 2027
Coordinator:
Karën Fort
Partners:
CHU Rouen, LISN, LORIA
Participants:
Maxime Amblard, Fanny Ducel, Karën Fort (coordinator), Michel Musiol, Miguel Couceiro
Abstract:
Large Language Models (LLM) are the Swiss Army knife of today’s Natural Language Processing (NLP). They often outperform the state-of-the-art on benchmarks commonly used in the field for tasks such as part-of-speech tagging, text classification and named-entity recognition, thus paving the way to a myriad of end-user applications. However, it has been shown that LLM exhibit major ethical issues including significant environmental impact, mirroring and amplification of stereotyped biases, which in turn have a disproportionate impact on historically disadvantaged social groups. It is urgent to address the social impact of NLP as the applications we develop, such as chatGPT, are now directly made available to end users. The detection and mitigation of biases have therefore become an active area of research in the past few years, focusing mainly on Masked Language Models (MLM) such as BERT in English and the North American social context. Several sources of bias were identified in the NLP pipeline. However the interconnection between sources and overall impact of each source on downstream applications remains unclear. In this project, we want to observe the entire pipeline, from the intrinsic point of view (within the model itself), to the pre-training task point of view (in the case of autoregressive LLM, text generation), on to some real-world downstream applications. We chose to focus on two types of medical applications: mental illness diagnosis help and information extraction from clinical records for public health purposes such as patient enrollment into clinical trials. The project will provide corpora and methods for a global evaluation of bias in LLM in French as well as studies to further the understanding of biases in clinical NLP pipelines and the environmental impact of the integration of these models in digital health.

10.3.2 ANR Project: CoDeinE

Participants: Karën Fort, Bruno Guillaume, Bertrand Remy.

Title:
artificial text COrpus DEsIgNed Ethically automatic synthesis of clinical documents
Duration:
03 2021–02 2025
Coordinator:
Aurélie Névéol (Limsi)
Partners:
CRC, CEA List, LISN, LORIA
Participants:
Bruno Guillaume, Karën Fort (local coordinator), Bertrand Remy
Abstract:
Machine learning methods have become prevalent in language technologies. They rely on annotated corpora to train models and evaluate algorithms. The CoDeinE project proposes to address the lack of shareable corpora in sensitive domains such as health or banking. The key idea of the project is to use confidential corpora to automatically generate synthetic texts that mimic the linguistic properties of real documents while preserving confidentiality. We will use clinical documents in electronic patient records as a case study. Furthermore, the project will rely on Games With A Purpose and crowd sourcing to validate and annotate the synthesized texts.

10.3.3 ANR Project: Autogramm

Participants: Bruno Guillaume, Karën Fort, Guy Perrier, Khensa Amani Daoudi.

Title:
Induction of descriptive grammar from annotated corpora
Duration:
01 2022–12 2025
Coordinator:
Sylvain Kahane (Université Paris Nanterre)
Partners:
MoDyCo, LACITO, LISN, Inria Nancy – Grand Est
Participants:
Bruno Guillaume (local coordinator), Karën Fort, Guy Perrier
Abstract:
The goal of this project is to automate, as far as possible, the extraction of descriptive grammars and grammatical descriptions from annotated corpora for linguistic and typological studies. The project also promotes the development of treebanks for under-endowed languages, in order to extract quantitative descriptive grammars for these languages. The project uses the annotation scheme SUD (Surface-syntactic Universal Dependencies), the query tool Grew-match and the annotation tool ArboratorGrew.

10.3.4 ANR Project: CODIM

Participants: Maxime Amblard, Jacques Jayez.

Title:
Compositionality and discourse markers
Duration:
01 2023–12 2026
Coordinator:
Mathilde Dargnat (Université de Lorraine and ATILF)
Partners:
ATIL, LLF, Inria Nancy – Grand Est
Participants:
Maxime Amblard, Jacques Jayez
Abstract:
The CODIM project focuses on the two main linguistic resources for organizing monologues or conversations in human languages : D(iscourse) M(arkers)(therefore/donc, well/ben, bon etc. in English/French) and prosody (in particular, intonation). It will evaluate their status with respect to two major views on communication: compositionality (the possibility of combining meaningful expressions into more complex meaningful expressions) and pattern or construction-based approaches (the idea that language users exploit partly `frozen’ strings of words). We will compare the semantic and prosodic properties of simple and complex French DM (e.g. ah + bon) found in corpora for written and spoken French, using a variety of technical tools for DM identification (category-driven text mining), clustering (statistics and Machine Learning) and research in prosody (duration and intensity measures, contour representation). The project fosters a number of collaborations between linguists and computer scientists.

11 Dissemination

Participants: Maxime Amblard, Marc Anderson, Hee-Soo Choi, Marie Cousin, Amandine Decker, Philippe de Groote, Fanny Ducel, Karën Fort, Bruno Guillaume, Jacques Jayez, Chuyuan Li, Michel Musiol, Guy Perrier, Sylvain Pogodalla, Valentin Richard, Vincent Tourneur.

11.1 Promoting scientific activities

11.1.1 Scientific events: organisation

Maxime Amblard: main organiser of the 15th International Conference on Computational Semantics, (IWCS 2023), Nancy 20-23 06 2023.
Valentin Richard: local chair of the workshop Inquisitiveness Below and Beyond the Sentence Boundary (InqBnB4), Nancy, 20 06 2023.
The Sémagramme team organized a scientific memorial day in honor of their colleague Guy Perrier who passed away in 2023.

General chair, scientific chair

Karën Fort: co-organizer of the Journées scientifiques du GDR LIFT, Nancy, Nancy, 20-21 11 2023. 44

Member of the organizing committees

Bruno Guillaume, Hee-Soo Choi, Marie Cousin, Amandine Decker, Amandine Lecomte, Chuyuan Li, Siyana Pavlova, Valentin Richard, Khensa Daoudi, Vincent Tourneur, Fanny Ducel, Julie Halbout, and Laura Masson-Grehaigne: members of the organizing committee of the 15th International Conference on Computational Semantics, (IWCS 2023), Nancy 20-23 06 2023.

11.1.2 Scientific events: selection

Chair of conference program committees

Maxime Amblard: general chair of the 15th International Conference on Computational Semantics (IWCS 2023), Nancy 20-23 06 2023.

Member of the conference program committees

Philippe de Groote, Siyana Pavlova, and Valentin Richard: members of the program committee of the 15th International Conference on Computational Semantics (IWCS 2023).
Sylvain Pogodalla: member of the program committee of the 29th conference on traitement automatique des langues naturelles (TALN 2023).

Reviewer

Maxime Amblard: ARR 2023, ACL 2023, EMNLP 2023, JPC 2023, Lift, Semdial, SICon.
Marc Anderson 12th International Workshop on Service Oriented, Holonic and Multi-Agent Manufacturing Systems for Industry of the Future (SOHOMA’22) Sustainability for the digital manufacturing era.
Philippe de Groote: ACL Rolling Review (ARR).
Karën Fort: 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), traitement automatique des langues naturelles (TALN 2023).
Sylvain Pogodalla: 28th Workshop on Logic, Language, Information and Computation (WoLLIC 2022).
Valentin Richard: 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023).

11.1.3 Journal

Member of the editorial boards

Maxime Amblard: Member of the editorial board and chief editor of the journal Traitement Automatique des Langues, in charge of the pdf pipeline.
Philippe de Groote: Area editor of the FoLLI-LNCS series.
Michel Musiol: Psychological and educational sciences (Université d'ElOued Ed).
Sylvain Pogodalla: Member of the editorial board of the journal Traitement Automatique des Langues, in charge of the Résumés de thèses section.

Reviewer - reviewing activities

Marc Anderson: Research Ethics: The International Journal of Research Ethics & Research Integrity, Journal of Social Computing.

11.1.4 Invited talks

Karën Fort: Ethics by design for real: lessons learned from an industry 4.0 European project. ERGO IA, Biarritz, France, 12 10 2023 13.
Guy Perrier: Why is graph rewriting interesting for computational linguistics?. Keynote presentation at GURT/SyntaxFest 2023, Washington D.C., 9 3 2023 14.

11.1.5 Invited seminars

Hee-Soo Choi: Corpus-based Language Universals Analysis using Universal Dependencies. Team seminar Traitement Automatique du Langage Écrit et Parlé (TALEP), Laboratoire d'Informatique et Systèmes (LIS), Marseille, France, 23 03 2023.
Bruno Guillaume: Introduction to Grew-match. 1st UniDive webinar, online, 19 6 2023.
Valentin Richard: Extraction des Interrogatives de Corpus Francophones Annotés en Dépendances Universelles, Café TAL, Atilf, Nancy, 26 06 2023.

11.1.6 Leadership within the scientific community

Maxime Amblard: Leader of OLKI2.0 project (Lorraine Université d'Excellence project - PIA).
Karën Fort has been co-chair of the ACL Ethics Committee since 2021, with Min-Yen Kan (Univ. of Singapore) and Y. Tsvetkov (Univ. of Washington).
Bruno Guillaume is Working Group leader in a CA21167 - Universality, diversity and idiosyncrasy in language technology (UniDive).

11.1.7 Scientific expertise

Maxime Amblard: evaluation for the ANR generic call 2023.

11.1.8 Research administration

Maxime Amblard:
- Head of the master in Natural Language Processing (master 1 and 2).
- Member of CNU 27 (Computer Science).
Karën Fort:
- Member of CNU 27 (Computer Science): participation to qualifications, promotions and suivi de carrière.
Bruno Guillaume:
- Head of the Natural Language Processing and Knowledge Discovery department of the LORIA laboratory.
- Manager (with Alain Polguère) of the CPER (Contrat de Plan État-Région) Langues, Connaissances et Humanités Numériques.
Sylvain Pogodalla:
- Elected member of the comité de centre Inria Nancy – Grand Est.
- In charge of the local commission IES (information et édition scientifique) of the Inria Nancy – Grand Est and LORIA.
- Member of the national commission IES of Inria.

11.2 Teaching - Supervision - Juries

11.2.1 Teaching

Licence:
- Maxime Amblard, AI Introduction, 36h, L1, Université de Lorraine, France.
- Maxime Amblard, Chuyuan Li and Marie Cousin NLP for beginners, 20h, L2, Université de Lorraine, France.
- Maxime Amblard and Chuyuan Li, Linguistic engineering, 20h, L3, Université de Lorraine, France.
- Hee-Soo Choi, Databases, 14h, Telecom Nancy, Université de Lorraine, France.
- Amandine Decker, Research Methods in Linguistics, 10h, University of Gothenburg, Sweden.
- Karën Fort, Relational databases, 55h, L3, Sorbonne Université, France.
- Pierre Ludmann, Informatics 2, 27h, Mines Nancy, Université de Lorraine, France.
- Pierre Ludmann, Databases, 114h, Polytech Nancy, Université de Lorraine, France.
- Pierre Ludmann, Java project, 16h, Polytech Nancy, Université de Lorraine, France.
- Pierre Ludmann, Year-long project, 20h, Polytech Nancy, Université de Lorraine, France.
- Pierre Ludmann, Mentoring, 40h, Polytech Nancy, Université de Lorraine, France.
- Siyana Pavlova, Algorithmique - Programmation 1, 44h, L1, Université de Lorraine, France.
- Siyana Pavlova, Bases de données, 28h, L2, Université de Lorraine, France.
- Valentin Richard, Langages de Scripts, 50h, L3, Université de Lorraine, France.
- Vincent Tourneur, Structures de données, 40h, L1, IUT Charlemagne, Université de Lorraine, France.
Master:
- Maxime Amblard, Siyana Pavlova and Fanny Ducel, Python Programming, 30h, M1 NLP (IDMC), Université de Lorraine, France.
- Maxime Amblard and Siyana Pavlova, Methods for NLP, 36h, M1 NLP (IDMC), Université de Lorraine, France.
- Maxime Amblard, NLP project, 20h, M1 NLP (IDMC), Université de Lorraine, France.
- Maxime Amblard, Marie Cousin, and Amandine Decker, Formalisms and Syntax, 24h, M2 NLP (IDMC), Université de Lorraine, France.
- Maxime Amblard, Valentin Richard and Siyana Pavlova, Discourse and Dialogue, 18h, M2 NLP (IDMC), Université de Lorraine, France.
- Marie Cousin, Agile Method and Scrum, 4h, M2 NLP (IDMC), Université de Lorraine, France.
- Marie Cousin and Amandine Decker, Data Structures, 20h, M1 NLP (IDMC), Université de Lorraine, France
- Philippe de Groote, Formal Logic, 22h, M1 NLP (IDMC), Université de Lorraine, France.
- Philippe de Groote, Formal languages, 22h, M1 NLP (IDMC), Université de Lorraine, France.
- Philippe de Groote, Computational Semantics, 18h, M2 NLP (IDMC), Université de Lorraine, France.
- Philippe de Groote, Computational structures and logics for natural language modeling, 18h, M2 NLP (IDMC), Université Paris Diderot – Paris 7, France.
- Karën Fort, Data Ethics, 18h, M1 ISF (Informatique et Statistique financières), Université Panthéon Assas, France.
- Karën Fort, Ethics and NLP (English), 17h30, M2 NLP (IDMC), Université de Lorraine, France.
- Karën Fort, Formal Grammar, 39h, M1, Sorbonne Université, France.
- Karën Fort, Corpora, resources and tools for linguistics, 39h, M1, Sorbonne Université, France.
- Karën Fort, Ethics and NLP, 15h, M2, Sorbonne Université, France.
- Karën Fort, Collaborative annotation for NLP, 30h, M2, Sorbonne Université, France.
- Bruno Guillaume, Written Corpora TAL (English), 45h, M1 NLP (IDMC), Université de Lorraine, France.
- Chuyuan Li, Introduction to G5K (English), 4h, M1 NLP (IDMC), Université de Lorraine, France.
- Pierre Ludmann, Software Engineering 2, 10h, Mines Nancy, Université de Lorraine, France.
- Pierre Ludmann, Compilation, 27.5h, Mines Nancy, Université de Lorraine, France.
- Pierre Ludmann, Department project, 5h, Mines Nancy, Université de Lorraine, France.
- Pierre Ludmann, Software Engineering, 42h, Polytech Nancy, Université de Lorraine, France.
- Pierre Ludmann, Web project, 6h, Polytech Nancy, Université de Lorraine, France.
- Pierre Ludmann, End-of-study internships, 20h Polytech Nancy, Université de Lorraine, France.
- Pierre Ludmann, End-of-study project, 20h Polytech Nancy, Université de Lorraine, France.
- Pierre Ludmann, Mentoring, 48h, Polytech Nancy, Université de Lorraine, France.
- Vincent Tourneur, UML beginners (English), 10h, M1 NLP (IDMC), Université de Lorraine, France.
Doctorate:
- Karën Fort, Scientific integrity, 6h, École doctorale 5, Faculté des lettres, Sorbonne Université.
- Karën Fort, Ethics and biases in NLP, Summer school of the GDR TAL, ETAL 06 2023 – Marseilles, France.
International Summer School:
- Bruno Guillaume with Kim Gerdes, Treebanking: methodology, tools and applications at the 34th European Summer School in Logic, Language and Information (ESSLLI 2023) in Ljubljana, Slovenia.
International Tutorials:
- Karën Fort with Lucianna Benotti (Universidad Nacional de Córdoba, Argentine), Yulia Tsvetkov and Min-Yen Kan. Tutorial at EACL 2023 (CORE A): Understanding Ethics in NLP Authoring and Reviewing (Introductory) 51.

11.2.2 Supervision

PhD
Chuyuan Li, Formal and Statistical Modeling of Dialogue, since 10 2019. Supervision: Maxime Amblard and Chloé Braud (IRIT).
PhD in progress
- Samuel Buchel, Linguistic, Semantic and Cognitive Modeling of Dialogical Incongruities and Discontinuities in The Interaction with The Schizophrenic Patients, since 12 2019. Supervision: Maxime Amblard and Michel Musiol.
- Hee-Soo Choi, Lier des ressources lexicales du français en vue d’une interopérabilité entre niveaux linguistiques, since 10 2021. Supervision: Karën Fort and Mathieu Constant.
- Marie Cousin, Modélisation de paraphrase dans les grammaires catégorielles abstraites, since 10 2022. Supervision: Philippe de Groote and Sylvain Pogodalla.
- Amandine Decker, Modelling Topic-level Interaction in Pathological Conversations, since 10 2022. Supervision: Maxime Amblard and Ellen Breitholtz (University of Gothenburg, Sweden).
- Fanny Ducel, Evaluating stereotyped biases in auto-regressive language models, since 10 2023. Supervision: Karën Fort and Aurélie Névéol (LISN-CNRS).
- Maxime Guillaume, Structures de traits pour les Grammaires Catégorielles Abstraites, since 07 2021. Supervision: Philippe de Groote and Raphaël Salmon (Yseop).
- Nicolas Hiebel, Création éthique de données textuelles artificielles : application au domaine biomédical, since 10 2021. Supervision: Aurélie Névéol (LISN-CNRS), Karën Fort and Olivier Ferret (CEA).
- Amandine Lecomte, Analyse longitudinale de prise en charge psychothérapeutique de patients psychiatriques et de patients atteints de maladies neurodégénératives : informatisation et modélisation dialogique des indices comportementaux associés à l’efficacité (vs échec) des stratégies de prise en charge tentées par les thérapeutes, since 10 2019. Supervision: Michel Musiol and Alexandra König.
- Pierre Ludmann, Dynamic Construction of Discursive Structures, since 09 2017. Supervision: Philippe de Groote and Sylvain Pogodalla.
- Siyana Pavlova, Tools and Methods for Semantic Annotation, since 11 2020. Supervision: Maxime Amblard and Bruno Guillaume.
- Valentin Richard, Aspects compositionnels et dynamiques de la sémantique inquisitrice, since 09 2021. Supervision: Philippe de Groote, Floris Roelofsen and Reinhart Muskens (Universiteit van Amsterdam, ILLC).
- Priyansh Trivedi, Injecting Lexical and Semantic Knowledge into Word, Phrasal and Sentence Embeddings, since 11 2021. Supervision: Philippe de Groote and Pascal Denis (until 05 2023).

11.2.3 Juries

Maxime Amblard. PhD reviewer (rapporteur) for Nesrine Bannour's PhD Information Extraction from Electronic Health Records: Studies on temporal ordering, privacy and environmental impact, Université Paris-Saclay, France, 3 11 2023.
Maxime Amblard. PhD jury for Nicolas Devatine's PhD Discourse-Driven Prediction and Characterization of Textual Bias, Université Toulouse III - Paul Sabatier, France, 23 10 2023.
Karën Fort. PhD reviewer (rapporteuse) for Nicolas Devatine's PhD Discourse-Driven Prediction and Characterization of Textual Bias, Université Toulouse III - Paul Sabatier, France, 23 10 2023.

11.3 Popularization

11.3.1 Internal or external Inria responsibilities

Maxime Amblard is a member of the scientific committee of )i( interstices.

11.3.2 Articles and contents

Karën Fort. Comment les "BigTech" investissent la recherche en Traitement automatique des langues ?INS2I Web site, CNRS, France, 12 09 2023.
Karën Fort. Interstices podcast. Quelle éthique pour les agents conversationnels ? 28 08 2023 53.

11.3.3 Education

Marie Cousin and Amandine Decker: animation of a MATh.en.JEANS workshop within Edmond de Goncourt secondary school in Pulnoy.

11.3.4 Interventions

Maxime Amblard. L’IA n’ira pas beaucoup plus loin. Oxford Style debate, Forum des Sciences Cognitves et du TAL, IDMC 29 11 2023.
Maxime Amblard and Michel Musiol. Le repérage des signes de la pathologie mentale dans le discours, Cycle de conférences sur le Handicap, Inria, 6 04 2023.
Maxime Amblard. Je parle à mon ordinateur, c'est grave docteur ?, cycle de conférence A votre santé, 16 03 2023.
Karën Fort. Les enjeux éthiques de l’IA vus par le prisme du traitement automatique des langues. In Séminaire #3 IA: Enjeux éthiques et juridiques, ARCEP (Autorité de régulation des communications électroniques, des postes et de la distribution de la presse). Paris. 15 12 2023.
Karën Fort, Hee-Soo Choi and Fanny Ducel: 2023-12-07, presentation to high school students in the context of the "Chiche !" initiative (lycée Varoquaux, Tomblaine, France).
Karën Fort and Hee-Soo Choi: 2023-12-01, presentation to high school students in the context of the "Chiche !" initiative (lycée Claude Gellée, Épinal, France).
Marie Cousin and Amandine Decker: 2023-02-01 and 2023-02-02, presentation to high school students in the context of the "Chiche !" initiative (lycée Cormontaigne, Metz, France).
Marie Cousin, Amandine Decker and Vincent Tourneur: 2023-03-02 and 2023-03-03, presentation to high school students in the context of the "Chiche !" initiative (lycée Jean-Victor Poncelet, Saint-Avold, France).
Amandine Decker: 2023-10-24, talks with high school students in the context of the first edition of the “Les cigognes” mathematics and computer science research retreat in the Grand-Est region.
Valentin Richard gave a talk for Pint of Science at Taverne de l'Irlandais (Nancy) on 22 05 2023.
Valentin Richard was interviewed by the newspaper L'Est Républicain about his participation to Pint of Science, 19 05 2023.
Valentin Richard gave a talk 50 for the round-table discussion organized by CJD Section de Nancy at Télécom Nancy (Nancy), on 16 10 2023.

12 Scientific production

12.1 Major publications

1 inproceedingsM.Mohamed Abdalla, J. P.Jan Philip Wahle, T.Terry Ruas, A.Aurélie Névéol, F.Fanny Ducel, S. M.Saif M. Mohammad and K.Karën Fort. The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research.Proceedings of the 61st Annual Meeting of the Association for Computational LinguisticsVolume 1: Long Papers61st Annual Meeting of the Association for Computational Linguistics1Toronto, CanadaAssociation for Computational Linguitics2023, 13141-13160HAL
2 articleM.Marc Anderson and K.Karën Fort. Human Where? A New Scale Defining Human Involvement in Technology Communities from an Ethical Standpoint.International Review of Information EthicsAugust 2022HAL
3 articleG.Guillaume Bonfante and B.Bruno Guillaume. Non-size increasing Graph Rewriting for Natural Language Processing.Mathematical Structures in Computer Science28082018, 1451--1484HAL DOI back to text
4 bookG.Guillaume Bonfante, B.Bruno Guillaume and G.Guy Perrier. Application of Graph Rewriting to Natural Language Processing.1Logic, Linguistics and Computer Science SetISTE Wiley2018, 272HAL back to text
5 articleP.Philippe de Groote and M.Makoto Kanazawa. A Note on Intensionalization.Journal of Logic, Language and Information2222013, 173-194HAL DOI
6 inproceedingsA.Aurélie Névéol, Y.Yoann Dupont, J.Julien Bezançon and K.Karën Fort. French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English.ACL 2022 - 60th Annual Meeting of the Association for Computational LinguisticsDublin, IrelandMay 2022HAL back to text
7 articleS.Sylvain Pogodalla. A syntax-semantics interface for Tree-Adjoining Grammars through Abstract Categorial Grammars.Journal of Language Modelling532017, 527--605HAL DOI back to text
8 articleR.Robert Reinecke, T. A.Tatjana A Nazir, S.Sarah Carvallo and J.Jacques Jayez. Factives at hand: When presupposition mode affects motor response.Journal of Experimental Psychology2022HAL DOI back to text

12.2 Publications of the year

International journals

9 articleM. M.Marc M Anderson and K.Karën Fort. Evaluating the acceptability of ethical recommendations in industry 4.0: an ethics by design approach. AI & Society: Knowledge, Culture and CommunicationJanuary 2024HAL DOI
10 articleM. M.Marc M Anderson. How we will Discover Sentience in AI.Journal of Social Computing432023HAL
11 articleM. M.Marc M Anderson. Rare Opportunity or History Revisited? The Pitfalls and Prospects for Ethical AI in light of Public Ethical Responses to the Telegraph.Studia Philosophica Wratislaviensia1832023HAL back to text
12 articleE.Eric Ettore, P.Philipp Müller, J.Jonas Hinze, M.Matthias Riemenschneider, M.Michel Benoit, B.Bruno Giordana, D.Danilo Postin, R.Rene Hurlemann, A.Amandine Lecomte, M.Michel Musiol, H.Hali Lindsay, P.Philippe Robert and A.Alexandra König. Digital Phenotyping for Differential Diagnosis of Major Depressive Episode: Narrative Review.JMIR Mental Health10January 2023, e37225HAL DOI back to text

Invited conferences

13 inproceedingsK.Karën Fort. Ethics by design for real : lessons learned from an industry 4.0 European project.ERGO'IABiarritz (France), FranceOctober 2023HAL back to text back to text
14 inproceedings G.Guy Perrier. Why is graph rewriting interesting for computational linguistics? GURT/SyntaxFest 2023 Washington, United States March 2023 HAL back to text

International peer-reviewed conferences

15 inproceedingsM.Mohamed Abdalla, J. P.Jan Philip Wahle, T.Terry Ruas, A.Aurélie Névéol, F.Fanny Ducel, S. M.Saif M. Mohammad and K.Karën Fort. The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research.Proceedings of the 61st Annual Meeting of the Association for Computational LinguisticsVolume 1: Long Papers61st Annual Meeting of the Association for Computational Linguistics1Toronto, CanadaAssociation for Computational Linguitics2023, 13141-13160HAL back to text
16 inproceedingsM.Marie Cousin. Meaning-Text Theory within Abstract Categorial Grammars: Towards Paraphrase and Lexical Function Modeling for Text Generation.Proceedings of the 15th International Conference on Computational Semantics (IWCS)IWCS 2023 - 15th International Conference on Computational SemanticsNancy, FranceAssociation for Computational LinguisticsJune 2023HAL back to text
17 inproceedingsA.Amandine Decker and M.Maxime Amblard. Analysing topic shifts in task-oriented dialogues.Journées scientifiques du GDR Lift - LIFT 2023Nancy, FranceNovember 2023HAL back to text
18 inproceedingsA.Amandine Decker, E.Ellen Breitholtz, C.Christine Howes and S.Staffan Larsson. Topic and genre in dialogue.Proceedings of the 27th Workshop on the Semantics and Pragmatics of DialogueSEMDIAL 2023Maribor, SloveniaAugust 2023, 143-145HAL
19 inproceedingsI.Izaskun Fernandez, K.Kerman Lopez de Calle, E.Eider Garate, R.Regis Benzmuller, M.Melodie Kessler and M.Marc Anderson. Human-Feedback for AI in Industry.CENTRIC 2023The Sixteenth International Conference on Advances in Human oriented andPersonalized Mechanisms, Technologies, and ServicesCENTRIC 2023, The Sixteenth International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and ServicesCENTRIC 2023 The Sixteenth International Conference on Advances in Human oriented and Personalized Mechanisms, Technologies, and ServicesValencia, SpainNovember 2023HAL
20 inproceedingsB.Bruno Guillaume. Graph-based multi-layer querying in Parseme Corpora.Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)Dubrovnic, CroatiaAssociation for Computational LinguisticsJune 2023, 58-64HAL DOI back to text
21 inproceedingsN.Nicolas Hiebel, O.Olivier Ferret, K.Karën Fort and A.Aurélie Névéol. Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French.The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023)Dubrovnik, CroatiaMay 2023HAL DOI back to text
22 inproceedingsC.Chuyuan Li, M.Maxime Amblard and C.Chloé Braud. A Semi-supervised Dialogue Discourse Parsing Pipeline.Journées Scientifiques du GDR Lift (LIFT 2023)Nancy, FranceNovember 2023HAL back to text
23 inproceedingsC.Chuyuan Li, P.Patrick Huber, W.Wen Xiao, M.Maxime Amblard, C.Chloé Braud and G.Giuseppe Carenini. Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues.Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023)European Chapter of the Association for Computational Linguistics (EACL)Dubrovnik, CroatiaMay 2023, 2562–2579HAL back to text
24 inproceedingsS.Siyana Pavlova, M.Maxime Amblard and B.Bruno Guillaume. Bridging Semantic Frameworks: mapping DRS onto AMR.Proceedings of The 15th International Conference on Computational Semantics (IWCS 2023)The 15th International Conference on Computational Semantics (IWCS 2023)Nancy, FranceJune 2023HAL back to text
25 inproceedingsS.Siyana Pavlova, M.Maxime Amblard and B.Bruno Guillaume. Structural and Global Features for Comparing Semantic Representation Formalisms.Proceedings of the 4th International Workshop on Designing Meaning Representation (DMR 2023)The 4th International Workshop on Designing Meaning RepresentationNancy, FranceJune 2023HAL back to text

National peer-reviewed Conferences

26 inproceedingsV.-T.Vincent-Thomas Barrouillet, M.Maxime Amblard and M.Michel Musiol. Toward an automatic identification of discontinuities in the pathological discourse of patient with schizophrenia.Journées scientifiques du GDR Lift - LIFT 2023Nancy, FranceNovember 2023HAL
27 inproceedingsH.-S.Hee-Soo Choi, K.Karën Fort, B.Bruno Guillaume and M.Mathieu Constant. Des ressources lexicales du français et de leur utilisation en TAL : étude des actes de TALN.Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : travaux de recherche originaux -- articles courts18e Conférence en Recherche d'Information et Applications -- 16e Rencontres Jeunes Chercheurs en RI -- 30e Conférence sur le Traitement Automatique des Langues Naturelles -- 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des LanguesParis, FranceATALA2023, 23-36HAL back to text
28 inproceedingsM.Marie Cousin. Towards an implementation of meaning-text theory with abstract categorial grammars.Actes de CORIA-TALN 2023. Actes des 16e Rencontres Jeunes Chercheurs en RI (RJCRI) et 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL)18e Conférence en Recherche d'Information et Applications -- 16e Rencontres Jeunes Chercheurs en RI -- 30e Conférence sur le Traitement Automatique des Langues Naturelles -- 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des LanguesParis, FranceATALA2023, 72-86HAL back to text
29 inproceedingsN.Nicolas Hiebel, O.Olivier Ferret, K.Karën Fort and A.Aurélie Névéol. Les textes cliniques français générés sont-ils dangereusement similaires à leur source ? Analyse par plongements de phrases.Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : travaux de recherche originaux -- articles courts18e Conférence en Recherche d'Information et Applications -- 16e Rencontres Jeunes Chercheurs en RI -- 30e Conférence sur le Traitement Automatique des Langues Naturelles -- 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des LanguesParis, FranceATALA2023, 46-54HAL back to text
30 inproceedingsN.Nicolas Hiebel, O.Olivier Ferret, K.Karën Fort and A.Aurélie Névéol. Similarité surfacique et similarité sémantique dans des cas cliniques générés.Journée d'étude sur la Similarité entre Patients, ATALA, SimPa 2023Paris, FranceMarch 2023HAL
31 inproceedingsS.Sylvain Kahane, S.Santiago Herrera, B.Bruno Guillaume and K.Kim Gerdes. Autogramm : développement simultané de treebanks et de grammaires à partir de corpus.Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 6 : projets18e Conférence en Recherche d'Information et Applications, 16e Rencontres Jeunes Chercheurs en RI, 30e Conférence sur le Traitement Automatique des Langues Naturelles, 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des LanguesParis, FranceATALA2023, 37-42HAL back to text
32 inproceedingsS.Siyana Pavlova, M.Maxime Amblard and B.Bruno Guillaume. A Layered Approach to Semantic Representation.Journées scientifiques du GDR Lift - LIFT 2023Nancy, FranceNovember 2023HAL back to text
33 inproceedingsV. D.Valentin D. Richard. Can French Interrogative Retrieval be Fully Machine-Based ?Actes des journées LIFT 20235èmes journées du Groupement de Recherche CNRS « Linguistique Informatique, Formelle et de Terrain » (LIFT 2023)Nancy, FranceNovember 2023, 69-76HAL back to text

Conferences without proceedings

34 inproceedingsM. M.Marc M Anderson. AI Ethics and the Lessons of History.2nd International Conference on the Ethics of Artificial IntelligencePorto, PortugalNovember 2023HAL back to text
35 inproceedingsM. M.Marc M Anderson. AI as Philosophical Ideology: A Critical look back at McCarthy’s Program.Philosophy in Technology Workshop 2nd EditionWrocław, PolandApril 2023HAL
36 inproceedingsF.Fanny Ducel, A.Aurélie Névéol and K.Karën Fort. Bias Identification in Language Models is Biased.Workshop on Algorithmic Injustice 2023Amsterdam, NetherlandsJune 2023HAL back to text
37 inproceedingsP.Philippe de Groote, M.Maxime Guillaume, A.Agathe Helman, S.Sylvain Pogodalla and R.Raphaël Salmon. Extending Abstract Categorial Grammars with Feature Structures: Theory and Practice.Logic and Engineering of Natural Language Semantics 20 (LENLS20)Osaka, JapanNovember 2023HAL back to text
38 inproceedingsP.Philippe de Groote. On the semantics of dependencies: relative clauses and open clausal complements - extended abstract -.Logic and Engineering of Natural Language Semantics 20 (LENLS20)Osaka, JapanNovember 2023HAL back to text
39 inproceedingsJ.Jacques Jayez. (Innocent?) Bias in Argumentation.IMPAQTS (Implicit Manipulation in Politics – Quantitatively Assessing the Tendentiousness of Speeches) final conferenceRome, ItalyApril 2023HAL back to text
40 inproceedingsA.Agata Savary, C.Cherifa Ben Khelil, C.Carlos Ramisch, V.Voula Giouli, V.Verginica Barbu Mititelu, N.Najet Hadj Mohamed, C.Cvetana Krstev, C.Chaya Liebeskind, H.Hongzhi Xu, S.Sara Stymne, T.Tunga Güngör, T.Thomas Pickard, B.Bruno Guillaume, A.Archna Bhatia, M.Marie Candito, P.Polona Gantar, U.Uxoa Iñurrieta, A.Albert Gatt, K.Kovalevskaite Jolanta, K.Krek Simon, T.Timm Lichte, N.Nikola Ljubešić, J.Johanna Monti, C.Carla Parra Escartín, M.Mehrnoush Shamsfard, I.Ivelina Stoyanova, V.Veronika Vincze and A.Abigail Walsh. PARSEME corpus release 1.3.Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)Dubrovnik, SloveniaAssociation for Computational LinguisticsMay 2023, 24-35HAL DOI back to text

Scientific book chapters

41 inbookK.Kim Gerdes, B.Bruno Guillaume, S.Sylvain Kahane and G.Guy Perrier. Function words in Surface-Syntactic Universal Dependencies.Function words in dependency syntaxJohn Benjamins Publishing Company2023HAL back to text
42 inbookP.Philippe de Groote. Deriving Formal Semantic Representations from Dependency Structures.14213Logic and Engineering of Natural Language Semantics,19th International Conference, LENLS19, Tokyo, Japan, November 19–21, 2022, Revised Selected PapersLecture Notes in Computer ScienceSpringerOctober 2023, 157-172HAL DOI back to text

Edition (books, proceedings, special issue of a journal)

43 proceedingsM. M.Marc M AndersonExploring the Idea of Ethical Sustainability for Digital Manufacturing.Service Oriented, Holonic and Multi-Agent Manufacturing Systems for Industry of the Future. SOHOMA 2023. Studies in Computational Intelligence, Springer2023HAL back to text
44 proceedingsK.Karën FortC.Claire GardentY.Yannick ParmentierActes des 5èmes journées du Groupement de Recherche CNRS « Linguistique Informatique, Formelle et de Terrain ».November 2023, 135HAL back to text
45 periodicalLambek–Grishin Calculus: Focusing, Display and Full Polarization.Outstanding Contributions to Logic25Samson Abramsky on Logic and Structure in Computer Science and BeyondAugust 2023, 877-915HAL DOI
46 proceedingsV. D.Valentin D. RichardF.Floris RoelofsenProceedings of the 4th Workshop on Inquisitiveness Below and Beyond the Sentence Boundary.4th Workshop on Inquisitiveness Below and Beyond the Sentence Boundary (InqBnB4)Association for Computational Linguistics2023HAL

Doctoral dissertations and habilitation theses

47 thesisC.Chuyuan Li. Facing Data Scarcity in Dialogues for Discourse Structure Discovery and Prediction.Université de LorraineAugust 2023HAL

Reports & preprints

48 miscJ.Jacques Jayez. Argumentation et probabilités, ou pourquoi l'argumentation rationnelle n'est pas (toujours) un raisonnement.December 2023HAL back to text
49 reportY.Yannick Parmentier, S.Sylvain Pogodalla, R.Rachel Bawden, M.Matthieu Labeau and I.Iris Eshkol-Taravella. Procédure de diffusion des publications de l'ATALA sur les archives ouvertes.ATALASeptember 2023, 17HAL
50 miscV. D.Valentin D. Richard. The costs of Artificial Intelligence: Un aperçu des problèmes éthiques de l'IA.October 2023HAL back to text

Other scientific publications

51 miscL.Luciana Benotti, K.Karën Fort, M.-Y.Min-Yen Kan and Y.Yulia Tsvetkov. Understanding Ethics in NLP Authoring and Reviewing.Dubrovnic, CroatiaMay 2023, 19-24HAL DOI back to text
52 inproceedingsB.Bruno Guillaume. Multi-layer querying in Corpora: Example of Parseme and UD.UniDive 1st general meetingSaclay, FranceMarch 2023HAL

12.3 Other

Scientific popularization

53 miscK.Karën Fort and J.Joanna Jongwane. Quelle éthique pour les agents conversationnels ? [podcast].September 2023HAL back to text

12.4 Cited publications

54 inproceedingsM.Maxime Amblard, C.Chloé Braud and M.Michel Musiol. Mon ordinateur est-il un bon psy ? Le TAL au service du diagnostic médical.Journée du GDR TAL : Intelligence artificielle et technologies des langues : l'ordinateur passe la barrière de la langue (2021)GDR TALParis, FranceJanuary 2021HAL back to text
55 articleP.Philippe de Groote and S.Sylvain Pogodalla. On the expressive power of Abstract Categorial Grammars: Representing context-free formalisms.134http://www.springerlink.com/content/1572-9583/2004, 421--438HAL DOI back to text
56 inproceedingsP.Philippe de Groote. Towards a Montagovian account of dynamics.Proceedings of the 16th Semantics and Linguistic Theory Conference (SALT 16)2006DOI back to text
57 inproceedingsP.Philippe de Groote. Towards abstract categorial grammars.Association for Computational Linguistics, 39th Annual Meeting and 10th Conference of the European ChapterColloque avec actes et comité de lecture. internationale.Toulouse, FranceJuly 2001, 148--155HAL back to text
58 articleB.Bruno Guillaume and G.Guy Perrier. Interaction Grammars.72-42009, 171--208HAL DOI back to text
59 incollectionA.Annie Kuyumcuyan and M.Michel Musiol. L'entretien clinique avec la personne polyhandicapée : un terrain commun sciences du langage / psychiatrie.Les sciences du langage face aux défis de la disciplinarisation et de l'interdisciplinarité. Malika Temmar, Marina Krylyschin, Guy Achard-Bayle (éds).January 2021HAL back to text back to text
60 inproceedingsA.Amandine Lecomte, S.Samuel Buchel, N.Nicolas Franck, C.Caroline Demily, M.Maxime Amblard and M.Michel Musiol. Organisations et fonctions du comportement verbal de type ``backchannels'' dans l'interaction clinique avec la personne souffrant de schizophrénie.8eme Congrès mondial de linguistique françaiseOrléans, FranceJuly 2022HAL back to text
61 inproceedingsP.Pierre Ludmann, S.Sylvain Pogodalla and P.Philippe de Groote. Multityped Abstract Categorial Grammars and Their Composition.WoLLIC 2022 - 28th International Workshop on Logic, Language, Information, and Computation13468Lecture Notes in Computer ScienceIaşi, RomaniaSpringer International PublishingSeptember 2022, 105--122HAL DOI back to text
62 bookI.Igor Mel'čuk. Semantics: From Meaning to Text.1Studies in Language Companion Series129Amsterdam/PhiladelphiaJohn Benjamins Publishing Company2012back to text
63 articleM.Michel Musiol. Incohérence et formes psychopathologiques dans l’interaction verbale.Psychose, langage et action: Approches neuro-cognitives2009, 217back to text
64 articleM.Michel Musiol, M.Manuel Rebuschi, S.Samuel Buchel, A.Amandine Lecomte, P.Philippe de Groote and M.Maxime Amblard. Le problème de l'analyse des troubles de la pensée dans le discours avec la personne schizophrène~: proposition méthodologique.872April 2022, 347--369HAL DOI back to text back to text
65 articleM.Michel Musiol and M.Manuel Rebuschi. La rationalité de l'incohérence en conversation schizophrène.in press2006, in pressHAL DOI back to text
66 articleS.Stéphanie Padroni, C.Caroline Demily, N.Nicolas Franck, C.Christine Bocerean, C.Christian Hoffmann and M.Michel Musiol. Ajustement comportemental et mouvements de saccades oculaires dans la schizophrénie.812016, 365--379HAL DOI back to text
67 inproceedingsG.Guy Perrier. A French Interaction Grammar.RANLP 2007 - International Conference on Recent Advances in Natural Language ProcessingIPP & BAS & ACL-BulgariaBorovets, BulgariaINCOMA Ltd, Shoumen, BulgariaSeptember 2007, 463--467HAL back to text
68 articleA.Arthur Trognon, M.Michel Musiol, E.Emilie Tinti, B.Blandine Beaupain and J.Jean Donadieu. Updated norms of the MOS-SF36 in the young French population.2022HAL DOI back to text

SEMAGRAMME - 2023

SEMAGRAMME - 2023

2023Activity reportProject-TeamSEMAGRAMME

Keywords

Computer Science and Digital Science

Other Research Topics and Application Domains

1 Team members, visitors, external collaborators

Research Scientists

Faculty Members

Post-Doctoral Fellow

PhD Students

Technical Staff

Interns and Apprentices

Administrative Assistants

External Collaborators

2 Overall objectives

2.1 Scientific Context

2.2 Syntax-Semantics Interface

2.3 Discourse Dynamics

2.4 Common Basic Resources

3 Research program

3.1 Overview

3.2 Formal Language Theory

3.3 Symbolic Logic

3.4 Type Theory and Typed Lambda-Calculus

4 Application domains

4.1 Deep Semantic Analysis

4.2 Text Transformation

4.3 Types for discourse markers

5 Social and environmental responsibility

5.1 Footprint of research activities

ANR InExtenso:

6 Highlights of the year

7 New software, platforms, open data

7.1 New software

7.1.1 ACGtk

7.1.2 Grew

7.1.3 SLODiM

7.1.4 HostoMytho

7.1.5 Arborator-Grew

7.2 Open data

7.2.1 Morphosyntactic Treebanks

8 New results

8.1 Syntax-Semantics Interface

8.1.1 Abstract Categorial Grammars

Feature Structure

Multityped ACG (mACG) and Weighted ACG

Encoding of Meaning-Text Theory Into ACGs

8.1.2 Formal semantics of dependency relations

8.1.3 Lexical Semantics and Linguistic Knowledge

8.1.4 Semantic Representation

8.1.5 Semantics of questions

8.2 Discourse Dynamics

8.2.1 Dialogue Modeling

8.2.2 Discourse Markers

8.2.3 Pathological Discourse Modeling

8.2.4 Cognitive traces of side issues

8.3 Common Basic Resources

8.3.1 Universal Dependencies and Surface Syntactic Universal Dependencies

8.3.2 Multiword annotation in the Parseme project

8.3.3 Induction of Descriptive Grammars

8.3.4 Mapping Lexical Resources

8.3.5 Sentence Semantic Similarity Corpus

8.4 Ethics and biases

8.4.1 Ethics@Loria

8.4.2 Evaluating Stereotypes in Masked Language Models in Many Languages

8.4.3 Evaluating stereotypes in autoregressive language models

8.4.4 NLP for NLP and Ethics

8.4.5 Ethics in AI Integration into Industry

9 Bilateral contracts and grants with industry

9.1 Bilateral Grants with Industry

9.1.1 Yseop

10 Partnerships and cooperations

10.1 International research visitors

10.1.1 Visits to international teams

Research stays abroad

Amandine Decker

10.2 European initiatives

10.2.1 H2020 projects

AI-Proficient