Section: New Results

Common Basic Resources

Participants : Maxime Amblard, Clément Beysson, Philippe de Groote, Bruno Guillaume, Guy Perrier, Sylvain Pogodalla, Nicolas Lefebvre.

Crowdsourcing Complex Language Resources

Using a Wikipedia corpus, we showed that participants in a game with a purpose can produce quality dependency syntax annotations [44]. In [15], we have been considering a more complex corpus of scientific language. We ran an experiment aiming at evaluating the production of the participants of the game, and compared it to a gold corpus, annotated and adjudicated by experts of the domain.

We also ran two surveys on ZombiLingo's players, in order to better understand who they are and what their motivations in playing the game are, and improve the participation in the game [14].

Universal Dependencies

We participated to the development of new versions of the French part of the Universal Dependencies project (UD, http://universaldependencies.org/). Version 2.0 [58] was released in March 2017. In this version, a new French corpus UD_French-Sequoia was added. We built this corpus with an automatic conversion (using the Grew software) from the data built in the Sequoia project.

Version 2.1 [24] was released in November 2017. The conversion process, using Grew, was applied to the FrenchTreebank corpus, and led to a new corpus in Universal Dependencies: UD_French-FTB. In version 2.1, we worked on the harmonization of the subset of French treebanks. The Grew software was used to explore, to check consistency, and to systematically correct the data.

The “enhanced dependencies” sketched in the UD 2.0 guidelines is a promising attempt in the direction of deep syntax, an abstraction of the surface syntax towards semantics. In [13] (collaboration with Marie Candito and Djamé Seddah), we proposed to go further and enrich the enhanced dependency scheme along two axes: extending the cases of recovered arguments of non-finite verbs, and neutralizing syntactic alternations. Doing so leads to both richer and more uniform structures, while remaining at the syntactic level, and thus rather neutral with respect to the type of semantic representation that can be further obtained. We implemented this proposal in two UD treebanks of French, using deterministic graph rewriting rules. Evaluation on a 200-sentence gold standard showed that deep syntactic graphs can be obtained from surface syntax annotations with a high accuracy. Among all semantic arguments of verbs in the gold standard, 13.91% are impacted by syntactic alternation normalization, and 18.93% are additional edges corresponding to deep syntactic relations.

In [16], we present a reflection on the annotation of written French corpora in syntax and semantics. This reflection is the result of work carried out on the SEQUOIA and the UD-FRENCH corpora.


There are two major levels of processing that are significant in the use of a computational semantic frameworks: semantic composition, for the construction of meanings, and inference, either to exploit those meanings, or to assist the determination of contextually sensitive aspects of meanings. FraCas is an inference test suite for evaluating the inferential competence of different NLP systems and semantic theories. Providing an implementation of the inference level was beyond the scope of FraCaS, but the test suite nevertheless provides an overview of a useful and theory- and system-independent semantic tool [37].

There currently exists a multilingual version of the resource for Farsi, German, Greek, and Mandarin. We started the translation into French. 10% of the resource has been translated so far as a testbed, in order to setup guidelines for the translations. We plan to complete the translation following these guidelines and use it as an experimental tool.