TALARIS stands for Traitment Automatique des Langues: Representation, Inference, et Semantique. As this name suggests, the aim of the TALARIS team is to investigate semantic phenomena (broadly constructed) in natural language from a computational perspective. More concretely, TALARIS's goal is to develop grammars (with a special emphasis on French) with a semantic dimension, to explore the linguistic and computational issues involved in such areas as natural language generation, textual entailment recognition, discourse and dialogue modeling, pragmatics, and multilinguality, and to investigate the interplay between representation and inference in computational semantics for natural language.
The work of the TALARIS team can be subdivided into four overlapping and mutually supporting categories.
Computational Semantics. This theme is devoted to the theoretical and computational issues involved in building semantic representations for natural language. Special emphasis is placed on developing large scale semantic coverage for the French Language.
Discourse, Dialogue and Pragmatics. This theme is devoted to developing theoretical and computational models of discourse and dialogue processing, and investigating the inferential impact of pragmatic factors (that is, the factors affecting how humans being actually use language).
Logics for Natural Language and Knowledge Representation. The theme is devoted to theoretical and computational tools for working with logics suitable for natural language inference and knowledge representation. Special emphasis is place on hybrid logic, higher order logic, and discourse representation theory (DRT).
Multilinguality for Multimedia. This theme is devoted to creating generic ISO-based mechanisms for representing and dealing with multilingual textual information. The center of this activity is the MLIF (Multi Lingual Information Framework) specification platform for elementary multilingual units.
The major long term computational goals of the TALARIS team are:
The design and implementation of incremental clustering techniques that can handle heterogeneous textual data collections.
The creation of a large scale computational semantics framework for French that supports deep semantic analysis and surface realisation (the production of sentences from meaning representations).
The integration and use of this framework in systems interfacing 3D worlds and Natural Language Processing (NLP) technologies e.g., extending a serious game with dialog capabilities or exploiting Natural Language Generation to automate the production of language learning exercices in a 3D setting.
The creation of efficient inference systems for logics that are capable of representing natural language content and the background knowledge required to support reasoning.
Integrating language technology and semantic resources into multimedia applications.
These computational goals will be pursued in the context of theoretical investigations that will rigorously map out the required scientific and mathematical context.
New results on dynamic modal logics (e.g., memory logics) in terms of lower and upper complexity bounds, tableaux algorithms, expressive power, axiomatization and interpolation.
Mature expositions of inference frameworks for hybrid logics, together with stable implementations of inference systems.
Talaris systems ranked first and third in the international GIVE (Giving Instruction in Virtual Environment) challenge .
The 3 day Natal workshop, organized by
Claire Gardent, which gathers together students and
researchers from Nancy, Saarbrúcken and neighbouring
areas around current themes in NLP themes, was
successfully held (for the third running) from 16–18 June
2010
http://
We said above that the central research theme of TALARIS was computational semantics (where “semantics” is broadly construed to cover various pragmatic and discourse level phenomena) and that TALARIS is particularly focused on investigating the interplay between representation and inference. Another way of putting this would be to say that the scientific foundations of TALARIS's work boil down to the motto: computational linguisticsmeets computational logicand knowledge representation.
From computational linguistics we take the large linguistic and lexical semantics resources, the parsing and generation algorithms, and the insight that (whenever possible) statistical information should be employed to cope with ambiguity. From computational logic and knowledge representation we take the various languages and methodologies that have been developed for handling different forms of information (such as temporal information), the computational tools (such as theorem provers, model builders, model checkers, sat-solvers and planners) that have been devised for working with them, together with the insight that, whenever possible, it is better to work with inference tools that have been tuned for particular problems, and moreover that, whenever possible, it is best to devote as little computational energy to inference as possible.
This picture is somewhat idealized. For example, for many languages (and French is one of them) the large scale linguistic resources (lexicons, grammars, WordNet, FrameNet, PropBank, etc.) that exist for English are not yet available. In addition, the syntax/semantics interface often cannot be taken for granted, and existing inference tools often need to be adapted to cope with the logics that arise in natural language applications (for example, existing provers for Description Logic, though excellent, do not cope with temporal reasoning). Thus we are not simply talking about bringing together known tools, and investigating how they work once they are combined — often a great deal of research, background work and development is needed. Nonetheless, the ideal of bringing together the best tools and ideas from computational linguistics, knowledge representation and computational logic and putting them to work in coordination is the guiding line.
Over the next decade, progress in natural language semantics is likely to depend on obtaining a deeper understanding of the role played by inference. One of the simplest levels at which inference enters natural language is as a disambiguation mechanism. Utterances in natural language are typically highly ambiguous: inference allows human beings to (seemingly effortlessly) eliminate the irrelevant possibilities and isolate the intended meaning. But inference can be used in many other processes, for example, in the integration of new information into a known context. This is important when generating natural language utterances. For this task we need to be sure that the utterance we generate is suitable for the person being addressed. That is, we need to be sure that the generated representations fit in well with the recipient's knowledge and expectations of the world, and it is inference which guides us in achieving this.
Much recent semantic research actively addresses such problems by systematically integrating inference as a key element. This is an interesting development, as such work redefines the boundary between semantics and pragmatics. For example, van der Sandt's algorithm for presupposition resolution (a classic problem of pragmatics) uses inference to guarantee that new information is integrated in a coherent way with the old information.
The TALARIS team investigates such semantic/pragmatic problems from various angles (for example, from generation and discourse analysis perspectives) and tries to combine the insights offered by different approaches. For example, for some applications (e.g., the textual entailment recognition task) shallow syntactic parsing combined with fast inference in description logic may be the most suitable approach. In other cases, deep analysis of utterances or sentences and the use of a first-order inference engine may be better. Our aim is to explore these approaches and their limitations.
In an ideal world, computational semanticists would not have to worry overly much about linguistic resources. Large scale lexica, treebanks, and wide coverage grammars (supported by fast parsers and offering a flexible syntax semantics interface) would be freely available and easy to combine and use. The semanticist could then focus on modeling semantic phenomena and their interactions.
Needless to say, in reality matters are not nearly so straightforward. For a start, for many languages (including French) there are no large-scale resources of the sort that exist for English. Furthermore even in the case of English, the idealized situation just sketched does not obtain. For example, the syntax/semantics interface cannot be regarded as a solved problem: phenomena such as gapping and VP-ellipsis (where a verb, or verb phrase, in a coordinated sentence is missing and has to be somehow “reconstructed” from the previous context) still offer challenging problems for semantic construction.
Thus a team like TALARIS simply cannot focus exclusively on semantic issues: it must also have competence in developing and maintaining a number of different lexical resources (and in particular, resources for French).
TALARIS is involved in such aspects in a number of ways. For example, it participates in the development of an open source syntactic and synonymic lexicon for French, in an attempt to lay the ground for a French version of FrameNet; and it also works on developing a large scale, reversible (i.e., usable both for parsing and for generation) Tree Adjoining Grammar for French.
Once again, in the ideal world, not only would computational semanticists not have to worry about the linguistic resources at their disposal, but they would not have to worry about the inference tools available either. These could be taken for granted, applied as needed, and the semanticist could concentrate on developing linguistically inspired inference architectures. But in spite of the spectacular progress made in automated theorem proving (both for very expressive logics like predicate logics, and for weak logics like description logics) over the last decade, we are not yet in the ideal world. The tools currently offered by the automated reasoning community still have a number of drawbacks when it comes to natural language applications.
For a start, most of the efforts of the first-order automated reasoning community have been devoted to theorem proving; model building, which is also a useful technology for natural language processing, is nowhere nearly as well developed, and far fewer systems are available. Secondly, the first-order reasoning community has adopted a resolutely `classical' approach to inference problems: their provers focus exclusively on the satisfiability problem. The description logic community has been much more flexible, offering architectures and optimisations which allow a greater range of problems to be handled more directly. One reason for this has been that, historically, not all description logics offered full Boolean expressivity. So there is a long tradition in description logic of treating a variety of inference problems directly, rather than via reduction to satisfiability. Thirdly, many of the logics for which optimised provers exists do not directly offer the kinds of expressivity required for natural language applications. For example, it is hard to encode temporal inference problems in implemented versions of description logics. Fourth, for very strong logics (notably higher-order logics) few implementations exists and their performance is currently inadequate.
These problems are not insurmountable, and TALARIS members are actively investigating ways of overcoming them. For a start, logics such as higher-order logic, description logic and hybrid logic are nowadays thought of as various fragments of (or theories expressed in) first-order logic. That is, first-order logic provides a unifying framework that often allows transfer of tools or testing methodologies to a wide range of logics. For example, the hybrid logics used in TALARIS (which can be thought of as more expressive versions of description logics) make heavy use of optimization techniques from first-order theorem proving.
The role of empirical methods (model learning, data extraction from corpora, evaluation) has greatly increased in importance in both linguistics and computer science over the last fifteen years. TALARIS members have been working for many years on the creation, management and dissemination of linguistic resources reusable by the scientific community, both in the context of implementation of data servers, and in the definition of standardized representation formats like TAG-ML. In addition, they have also worked on the applications of linguistic ideas in multimodal settings and multimedia.
Such work is important to our scientific goals. As we said above, one of the most important points that needs to be understood about logical inference is how its use can be minimized and intelligently guided. Ultimately, such minimization and guidance must be based on empirical observations concerning the kinds of problems that arise repeatedly in natural language applications.
Finally, is should be remarked that the emphasis on empirical studies lends another dimension to what is meant by inference. While much of TALARIS's focus is on symbolic approaches to inference, statistical and probabilistic methods, either on their own or blended with symbolic approaches, are likely to play an increasingly important role in the future. TALARIS researchers are well aware of the importance of such approaches and are interested in exploring their strengths and weaknesses, and where relevant, intend to integrate them into their work.
The development of large scale grammars is a complex task
which usually involves factorising information as much as
possible. While good grammar writing and factorisation
environments exist for “non tree grammars” (e.g., HPSG, LFG),
this is not the case for “tree based grammars” such as TAG,
Interaction Grammars or Tree Description Grammars. The
Extended Metagrammar Compiler (XMG) developed at TALARIS
remedies this shortcoming while additionally providing a
clean and modular way to describe several linguistic
dimensions thereby supporting the production of tree grammars
with semantic information
TALARIS has a longstanding interest in the semantics and
the processing of referential expressions. In recent years,
an extensive corpus annotation has been carried out on 5.000
definite descriptions
The tree adjoining grammar for French developed by TALARIS
associates with each NL expression not only a syntactic tree
but also a semantic representation. Interestingly, the
semantic calculus used is reversible in that the association
between strings and semantic representations is
non-directional (declarative). We put this feature to work
and have been working over the years towards developing
surface realisers for French called GenI
In essence, the textual entailment recognition task is an
inference task, namely deciding whether the information
contained in a given text
T1can be inferred from the information provided by
another text
T2.
It is crucial to be able to answer this question. One important characteristic of natural language is the large number of ways in which it can express the same information. Many natural language processing applications like question answering, information retrieval, generation, and anaphora resolution need to deal with this diversity efficiently and accurately, and recognising textual entailments is a key step towards this.
Textual entailment recognition is a difficult task. We work on developing linguistically principled approaches to tackle specific entailment sources such as syntactic variations or the compositional semantics of factive verbs , , .
Members of TALARIS have long actively proposed and
developed the idea of using inference (and in particular,
using computational tools like model builders and theorem
provers) as an integral part of different tasks in
computational semantics, mainly during semantic construction
TALARIS's main contribution in this topic has been the design of resolution and tableaux calculi for hybrid logics, calculi that were then implemented in the HyLoResand HTabtheorem provers. For example, TALARIS members have proved that the resolution calculus for hybrid logics can be enhanced with optimisations of order and selection functions without losing completeness. Moreover, a number of `effective' (i.e., directly implementable) termination proofs for the hybrid logic has been established, for both resolution and tableaux based approaches, and the techniques are being extended to more expressive languages. Current work includes adding a temporal reasoning component to the provers, extending the architecture to allow querying against a background theory without having to explore again the theory with each new query, and testing the hybrid provers performance against dedicated state-of-the-art provers from other domains (firs-order logic, description logics) using suitable translations.
Moreover, we are interested in providing a range of inference services beyond satisfiability checking. For example, the current version of HyLoResand HTabincludes model generation (i.e., the provers can generate a model when the input formula is satisfiable).
We have also started to explore other decision methods (e.g., game based decision methods) which are useful for non-standard semantics like topological semantics. The prover HyLoBanis an example of this work.
MLIF (Multi Lingual Information Framework) is intended to be a generic ISO-based mechanism for representing and dealing with multilingual textual information. A preliminary version of MLIF has been associated with digital media within the ISO/IEC MPEG context and dealing with subtitling of video content, dialogue prompts, menus in interactive TV, and descriptive information for multimedia scenes. MLIF comprises a flexible specification platform for elementary multilingual units that may be either embedded in other types of multimedia content or used autonomously to localise existing content.
In 2010, Talaris addressed a new application domain namely, the integration of deep natural language processing (NLP) techniques with 3D worlds and games. A first foray into that theme has been the submission of two systems to the international GIVE (Giving instructions in a virtual environment). Two recently accepted EU funded projects (Interreg project Allegro and Eurostar project Emo-Speech) on that theme will permit a fully blown exploration of the research issues and of the technological problems arising in this area. This new theme builds on the tools and techniques developped by Talaris over the last 5 years for deep NLP and in particular, on the availability of an expressive grammar writing environnement (XMG), of wide coverage deep grammars for French and English (SemTAG and SemXTAG), of a grammar based surface realiser (GenI) and of parsers (LLP2, SemConst) using these grammars.
AGREE (Asynchronous Grounding of REferential Expressions) is a set of modules that manage the grounding process at the reference level. It contains an interpretation evaluation module that construes understanding judgments made by the system and those manifested in the dialogue by the user, a dialogue module that maintains a coherent state of the dialogue (adjacency pairs), and a generation module (GenI) in order to produce paraphrases of the understood referents. The whole system has been implemented in Java and uses the same semantic/referential representation that was used in the MEDIA project.
Version: 0.1
License: GPL
Last update: 2008-11-12
Web site:
http://
Authors: Alexandre Denis
Contact: Alexandre Denis
A metagrammar compiler generates automatically a grammar from a reduced description called a MetaGrammar. This description captures the linguistic properties underlying the syntactical rules of a grammar. Various past and present TALARIS members have been working on metagrammar compilation since 2001 and several tools have been developed within this framework starting with the MGC system of Bertrand Gaiffe (now of ATILF, Analyse et Traitment Informatique de la Langue Francaise, a Nancy-based CNRS unit) to the newly developed XMGsystem of Crabbé et al.
The XMGsystem is a 2nd generation compiler that proposes (a) a representation language allowing the user to describe in a factorised and flexible way the linguistic information contained in the grammar, and (b) a compiler for this language (using a Warren Abstract Machine-like architecture). An innovative feature of this compiler is the fact that it makes it possible to describe several linguistic dimensions, and in particular it is possible to define a natural Syntax/Semantics interface within the Metagrammar.
The compiler actually supports two syntactic formalisms (Tree Adjoining Grammars and Interaction Grammars) and the description both of the syntactic and of the semantic dimension of natural language. The generated grammars are in XML format, which makes them easy to reuse. Plug-ins have been realised with the LLP2 parser, with Eric de la Clergerie's DyALog parser and with the GenIgenerator. Future work will deal with the modularisation and the extension of XMGto define a library of languages describing linguistic data allowing the user to describe his own target formalism.
Developed under the supervision of Denys Duchier, the XMGcompiler is the result of an intensive collaboration with CALLIGRAMME. It has been implemented in Oz/Mozart and runs under the Linux, Mac, and Windows platforms. It is available with tools easing its use with parsers and generators (tree viewer, duplicate remover, anchoring module, metagrammar browser).
The system is currently being used and tested by Owen Rambow (University of Columbia, USA) and Laura Kallmeyer (University of Tuebingen, Germany).
Version: 1.1.4
License: CeCILL
Last update: 27/09/2005
Web site:
http://
Documentation:
http://
Authors: Benoit Crabbé, Denys Duchier, Joseph Le Roux, Yannick Parmentier
Contact: Benoit Crabbé, Yannick Parmentier
Frolog is a dialogue system based on current technology from computational linguistics, artificial intelligence planning, and theorem proving. It implements a text adventure game engine that uses natural language processing techniques to analyse the player's input and generate the system's output.
The Frolog core is implemented in Prolog and Java, but it uses external tools for the most heavy-loaded tasks. It performs syntactic analysis of the input based on an English grammar developed using XMGand computes a flat semantic representation using the Tulepa parser. It then uses the constructed semantic representation and an off-the-shelf planner to interpret the player's intention and change the world model accordingly. The world is modelled as a knowledge base in description logics, and accessed using the Description Logic theorem prover Racer. Finally, the results of the action, or descriptions of objects, are generated automatically, using the GenIgenerator.
Frolog is intended to serve as a laboratory in order to test pragmatic theories about the phenomenon of accommodation. It is also result in the first integrated system to use SemTag(the LORIA toolbox for TAG-based Parsing and Generation).
Version: 1.0
License: GPL
Last update: 2008-11-07
Authors: Luciana Benotti, Alejandra Lorenzo, Laura Perez
Contact: Luciana Benotti
The GenIsurface realiser is a successor of the InDiGen realiser. Also based on a chart algorithm, it is implemented in Haskell and aims for modularity, re-usability and extensibility. The system is “stand-alone” as we use the Glasgow Haskell compiler to obtain executable code for Windows, Solaris, Linux and Mac OS X.
The GenIgenerator uses efficient datatypes and intelligent rule application to minimise the generation of redundant structures. It also uses a notion of polarities as a means, first, of coping with lexical ambiguity and second, of selecting variants obeying given syntactic constraints.
GenIis compatible with both a grammar for French ( SemTag) and for English ( SemXTag), both grammars beeing produced using the MetaGrammar Compiler. SemTagcovers the basic syntactic structures of French as described in Anne Abeillé's book “An Electronic Grammar for French”. SemXTaghas a coverage similar to that of XTAG, the TAG grammar for English developped by the University of Pennsylvannia . Both grammars are additionnally equiped with a compositional semantics supporting semantic construction (during parsing) and/or surface realisation.
The system can process the output of the XMGMetagrammar compiler mentioned above.
Version: 0.20.2
License: GPL
Last update: 2009-11-16
Web site:
http://
Project(s): GenI
Authors: Carlos Areces, Claire Gardent, Eric Kow
Contact: Claire Gardent
HyLoResis a resolution based theorem prover for hybrid logics (it is complete for the hybrid language H(@, ), a very expressive but undecidable language, and it implements a decision method for the sublanguage H(@)). It implements a version of the “given clause” algorithm which is the underlying framework of many current state of the art resolution-based theorem provers for first-order logic; and uses heuristics of order and selection function to prune the search space on the space of possible generated clauses.
HyLoResis implemented in Haskell, and compiled with the Glasgow Haskell compiler (thus, users need no additional software to use the prover). We have also developed a graphical interface.
The interest of HyLoResis twofold: on one hand it is the first mature theorem prover for hybrid languages, and on the other, it is the first modern resolution based prover for modal-like languages implementing optimisations and heuristics like order resolution with selection functions.
Version: 2.5
License: GPL
Last update: 2009-04-09
Web site:
http://
Authors: Carlos Areces, Daniel Gorín and Juan Heguiabehere
Contact: Carlos Areces
The main goal behind HTabis to make available an optimised tableaux prover for hybrid logics, using algorithms that ensure termination. We ultimately aim to cover a number of frame conditions (i.e., reflexivity, symmetry, antisymmetry, etc.), as far as we can ensure termination. Moreover, we are interested in providing a range of inference services beyond satisfiability checking. For example, the current version of HTabincludes model generation (i.e., HTabcan generate a model from a saturated open branch in the tableau).
HTaband HyLoResare actually being developed in coordination, and a generic inference system involving both provers is being designed. The aim is to take advantage of the dual behaviour existing between the resolution and tableaux algorithms: while resolution is usually most efficient for unsatisfiable formulas (because a contradiction can be reported as soon as the empty clause is derived), tableaux methods are better suited to handle satisfiable formulas (because a saturated open branch in the tableaux represents a model for the input formula).
Version: 1.5.4
License: GPL
Last update: 2010-11-10
Web site:
http://
Authors: Carlos Areces, Guillaume Hoffmann
Contact: Guillaume Hoffmann
HyLoBanis a game-based prover, resulting from a direct implementation of Sustretov's game-based proofs of the PSPACE-completeness of the hybrid logics of T0 and T1 topological spaces. The interest of this approach is that termination is guaranteed and in addition the underlying game-based architecture is of independent interest; its disadvantage is that (at present) it is still extremely inefficient.
Version: 0.2
License: GPL
Last update: 2009-10-29
Web site:
http://
Authors: Carlos Areces, Guillaume Hoffmann, Dmitry Sustretov
Contact: Guillaume Hoffmann
hGen is a random CNF (conjunctive normal form) generator of formulas for sublanguages of H(@, , A, P). It is an extension of the latest proposal of Patel-Schneider and Sebastiane, nowadays considered the standard testing environment for classical modal logics. The random generator is used for assessing the performance of different provers.
Version: 1.2
License: GPL
Last update: 2009-06-17
Web site:
http://
Authors: Carlos Areces, Daniel Gorín, Juan Heguiabehere and Guillaume Hoffmann
Contact: Carlos Areces
In the framework of the MEDIA project, software has been developed to process transcriptions of a spoken dialogue corpus and to provide a semantic representation of their task-related content. This software contains a tokeniser, a LTAG parser (LLP2), a LTAG grammar, an OWL ontology and a set of rules in description logic, and works together with a reasoner such as RACER. The current version contains a reference resolution module (anaphora and deixis) which is based on the referential domains theory. The package also contains ways to project the semantic form (referentially solved) into the MEDIA formalism and to evaluate the accuracy of the representation using a test corpus. The whole system has been implemented in Java and communicates with other modules using TCP/IP.
Version: 0.5
License: GPL
Website:
http://
Last update: 12/11/2008
Project(s): MEDIA
Authors and Contact: Alexandre Denis
Nessie is a semantic construction tool written in OCaml. It takes a lexicon and a syntax tree as input and produces a semantic representation taking the form of a simply typed lambda term. Simply typed lambda calculus is used not only as the target language, but also as the glue language for assembling the representations provided by the lexicon.
This tool has been successfully used in several applications, the most notable of which being the computation of discourse semantics according to two different theories, namely the compositional DRT (Muskens 95) and the compositional treatment of dynamicity (de Groote 2006).
Future developments of Nessie may include using richer typing systems, and interfacing it with inference and rewriting tools to simplify the representations it produces.
Last update: 2008-11-14
Authors: Sébastien Hinderer
Contact: Sébastien Hinderer
DeDe is a corpus of roughly 50.000 words where around
5.000 definite descriptions have been annotated as
coreferential, contextually dependent, non referential or
autonomous. The corpus consists of articles from the
newspaper
Le Mondeand is annotated with Multext-based
morphosyntactic information
Authors: Claire Gardent, Hélène Manuelian
Web site: Distributed by the CNRTL
http://
Contact: Claire Gardent
A TAG grammar developed with the XMGmetagrammar compiler and which describes both the syntax and the semantics of a core fragment coverage of French. Syntactically, the grammars covers the constructions described in A. Abeillé 's book. Additionnally, it is equipped with a unification based compositional semantics which supports both semantic construction (using LLP2, Tulipa or SemCONST) and surface realisation (using GenI).
Authors: Claire Gardent, Benoit Crabbé
Contact: Claire Gardent
A TAG grammar for English developed with the XMGmetagrammar compiler and which describes both the syntax and the semantics of English. Syntactically, the grammar has a coverage comparable to that of the XTAG grammar developed by the University of Pennsylvannia. Additionnally, the grammar integrates a unification based compositional semantics. Used both for parsing (by LLP2 and SemCONST) and for generation (by GenI).
Authors: Claire Gardent, Katya Alahverdzhieva
Contact: Claire Gardent
The WikipediaAnnotator program provides semantic annotation of Wikipedia discussion pages. It annotates French Wikipedia participants utterances on the connotation and subjective levels using deep syntax and shallow semantics.
Developing context: CCCP-Prosodie
Programming language: Java
Development effort: 18 man/month
Type of license: GPL
Partners: Telecom Paris Tech
Web site: – (in construction)
ATOOL is a semantic Annotation Tool for the High-Level Semantic Representation MMIL which takes as input the MEDIA corpus (TEI format according to the specifications document of September 2009)
Developing context: PORT-MEDIA
Programming language: Java
Development effort: 4 months (student) + 1 month of evaluation and improvements.
Expected users: Annotators
Type of license: Open Source.
This tool allows the annotation of the utterances in the MEDIA corpus and stores all the information about predicates and arguments in a relational database.
Developing context: PORT-MEDIA
Programming language: JSP-JAVA
Development effort: 1 month.
Expected users: Annotators
Type of license: Open Source.
Web site:
http://
PORT-MEDIA is a framework supporting the automatic annotation of the MEDIA corpus with the high-level semantic representation MMIL. It is a blackboard architecture interfaced with the Tree Tagger, the Malt parser, the frames, the semantic role labeling and the HLSBuilder for the automatic annotation of the MEDIA Corpus. Additionally it is interfaced with RACER, two ontologies in owl and a relational database with all the information at different linguistic levels. Finally the evaluation software is also provided.
Developing context: PORT-MEDIA
Programming language: Java, MySql, XML, XSLT, OWL.
Development effort: 12 months.
Expected users: Annotators
Type of license: Open Source.
The Web Service for the Multilingual-Assisted Chat Interface program (WSMACI) is a linguistic assistant for virtual worlds. Its first version is dedicated to English assistance in such worlds. It has been developed in the context of the Metaverse1 project. It provide the end-users with MLIF-based provision of sentence analysis and word information (synonyms, definitions, translations) based on Google Translate, WordNet and the Brown Corpus.
Programming language: MLIF, PHP, SQL
Development effort: 6 man/month
Type of license: INRIA specific
Authors : Tarik Oswald, Samuel Cruz-Lara
Contact: Tarik Oswald
Web site: – (in construction)
The 4 Layers Emotion Detection program (4LED) is an emotion detection tool. The emotions are extracted from texts in particular, from chat interfaces in virtual worlds. It has been developed in the context of the Metaverse1 project. The emotion detection process is based on SMILEY detection using WordNet-Domains and Tree-Tagger-based rules, WordNet-Affect, and keywords.
Programming language: MLIF, PHP, SQL
Development effort: 6 man/month
Type of license: INRIA specific
Partners: ArtefactO (for rendering)
Authors : Tarik Oswald, Samuel Cruz-Lara
Contact: Tarik Oswald
Web site: – (in construction)
The Second Life Magic Carpet program (SLMC) is an assistant whose role is to guide people through virtual worlds with textual instructions. It has been developed in the context of the Metaverse1 project. It has been developed in the context of the Metaverse1 project. It analyses the instructions of the visitors in order to find where they want to go, using web services for the analysis, for synonyms retrieving and for path finding.
Developing context: Metaverse1
Programming language: LSL, PHP, SQL
Development effort: 8 man/month
Type of license: INRIA specific
Partners: Innovalia Spain, Utrecht University
Authors : Tarik Oswald, Samuel Cruz-Lara
Contact: Tarik Oswald
Web site: – (in construction)
The DiacXis program seeks to analyse the evolution over
time of textual information. It has been developed in the
context of the CPER TALC action McFiiD. DiacXis addresses the
analysis of textual information evolving over time using a
diachronic approach based both on the Multiview Data Analysis
(MVDA) paradigm and on specific cluster labeling techniques
especially developed for the statistical analysis of complex
data, like textual data
Programming language: C + Java
Development effort: 6 man/month
Type of license: INRIA specific
Partners: INIST, ITT Kampur
Authors : Jean-Charles Lamirel, Navesh Pryankar
Contact: Jean-Charles Lamirel
Web site: – (in construction)
The IGNG-F program implements a new incremental clustering
algorithm whose main domain of application is the statistical
analysis of continuous flow of evolving textual data. It has
been developed in the context of the CPER TALC (McFiiD
action). It is based on a generic adaptation of the classical
neural-based clustering approaches using gas of neurons with
free topology. This adaptation resulted in a description
space independent and parameter-free, neural clustering
technique using Hebbian learning and labeling expectation
maximization instead of classical Euclidean or correlation
distances
Programming language: C
Development effort: 6 man/month
Type of license: INRIA specific
Partners: INIST, ITT Hyderhabad
Authors : Jean-Charles Lamirel, Raghvendra Mall
Contact: Jean-Charles Lamirel
Web site: – (in construction)
The TextClus interface is a java interface whose role is
to provide end-users with a whole set of clustering
techniques applied to texts. It has been developed in the
context of the CPER TALC McFiiD operation. The platform uses
a vectorial representation of text data. It includes, in a
federating interface, the management of different kinds of
data preprocessing techniques (IDF, entropy, random mapping,
...), different kinds of clustering techniques (from standard
methods to elaborated neural ones), the management of
multiple experiments and comparison of their results by
method-independent clustering quality measures specifically
adapted to the analysis of textual data
Programming language: C
Development effort: 6 man/month
Type of license: INRIA specific
Partners: INIST
Authors : Jean-Charles Lamirel, Pascal Cuxac
Contact: Jean-Charles Lamirel
Web site: – (in construction)
We are investigating modal logics that include operators that not only allow for exploration of the model in which they are evaluated, but that can also modifyit. These logics could be specially suitable to describe, for example, the semantic of utterances, to model the fact that uttering a sentence changes the context it was uttered in.
We have investigated in detail a family of dynamic modal logics called memory logicsand established lower and upper complexity bounds, mapped their expressive power, devised tableaux and model checking algorithms, and sound and complete axiomatizations.
This year we have finally released stable versions of the two theorem provers (HyLoRes and HTab) for hybrid logics developed by the team. Moreover, the theoretical framework behind each of them has been described in detail in a PhD thesis and a journal article.
This work concentrates on integrating techniques from logic and artificial intelligence ( notably planning) with work on pragmatics and the structure of dialogue , .
We developed two generation systems for the international GIVE challenge , . Interfaced with the 3D game provided by the challenge organisers, these systems guide the player with natural language instructions they generate in real time and according to the player's position in the game. The two systems were ranked first and third. Their development revived a long standing Talaris/Led interest in the generation of referring expressions and launched a new line of research on situated referring expression generation and on the interface between 3D game and NLP.
Within the Interreg IV A Allegro project, we investigated how embedding text generation in a 3D game could help automating the generation of situated language exercises i.e., exercises whose content varies with the 3D world context, with the learner level and with the teaching goal. We developed a system (I-FLEG) illustrating this interaction and made contact with language teachers to arrange for learner use and teacher testing. I-FLEG will be deployed in 2011 in language learning classes and its usability for language teaching tested.
Paul Bédaride and Claire Gardent developed a new method for constructing semantic representations for textual data which is based on graph rewriting , , . They tested it on artificial data with good results and showed that it permits constructing deep semantic representations from dependency structures.
Making use of the fact that Regular Tree Grammars can generate the derivation trees of a Tree Adjoining Grammar, we developed an alternative surface realiser to Geni called RTGen. Preliminary results suggest that RTGen outperforms GenI. Current work concentrates on further optimising RTGen and on integrating top down control to reduce the search space and constrain the output , .
Although verb classes and Semantic Role Labelling (SRL) have been shown to be an essential component of semantic processing, there is to date no such resource and tool for French. Using Formal Concept Analysis and information from the LADL tables, we developed a method for classifying verbs based on their subcategorisation information; and a method for projecting thematic roles onto the Paris 7 dependency bank. We are currently extending the verb classification approach to integrate thematic grids and working on evaluating and improving the result. In the long run, we aim to provide a Propbank style corpus for French to and to develop a semantic role labeller based on this corpus , , .
In the context of the CCCP-Prosodie project, we showed that connotation and subjective markers are good evidence for conflicts between participants on Wikipedia discussion pages. We observed striking patterns of interaction during conflicts, especially the alternation of 1st person and 2nd person subjective markers in negatively connotated utterances. Further work will involve examine the various conflicts type, aiming to automatically distinguish argument-based conflicts from personal ones (ad hominem)
TALARIS contributes to ISO TC 37 committee “Terminologies and other Language Resources”, and more specifically to the activities of its SC3 “Computer Applications in Terminology”, and SC4 “Linguistic Resources Management”. Within TC37/SC4, TALARIS is currently contributing, as project leader, to the definition and specification of the Multi Lingual Information Framework (MLIF) [ISO DIS 24616]. MLIF is being designed with the objective of providing a common abstract model being able to generate several formats used in the framework of translation and localization. MLIF will soon be released as FDIS (Final Draft International Standard) and it should finally be published as an official ISO Standard within the first semester of 2011 , , .
Cluster quality evaluation is a key issue for many data analysis tasks. As we showed in previous work, the classical distance based quality indices are often strongly biased and highly dependent on the clustering method. To cope with such problems, we proposed in earlier work specific Macro-Recall/Precision and F-measures metrics that exploit the properties of cluster associated data. However, our more recent experiments showed that these new metrics failed to highlight degenerated clustering results when analyzing complex textual data , . To remedy this shortcoming, we devised two extensions of these metrics namely Micro-Measures , , and Cumulated Micro-measures . We then experimentally showed the effectiveness of our extended approach by applying it to different documentary corpus of highly polythematic bibliographic records issued from the PASCAL CNRS scientific database.
The literature taking into account the chronological aspect in textual information flows focuses on "DataStream" whose main idea is the "on the fly" management of incoming data. However the proposed algorithms are intended to treat very large volumes of data and are thus not optimal for detecting emergent topics such as, for example, the evolution of a research theme within bibliographical records. To address this issue, we proposed a new approach based on our Multiview Data Analysis Paradigm (MVDA) , , , in combination with specific cluster labeling techniques especially developed for the statistical analysis of complex data, like textual data. We applied our approach to the IST PROMTECH reference dataset related to optoelectronic research. When compared with state-of the art approaches, our method proved to provide very significant added-value by permitting to precisely highlight and quantify the observed evolutions, and their related context, ranging from vocabulary changes in a given topic to overall appearing/disappearing of topics, or even to splitting or merging between topics .
Neural clustering algorithms show high performance in the general context of the analysis of homogeneous textual datasets. However, we showed that there is a drastic decrease of performance of these algorithms, as well as of the more classical algorithms, when applied to heterogeneous or polythematic textual datasets. Such result degradation indicates that most of the exiting clustering methods, even those that are considered incremental, are not really able to deal with highly time-varying data . We therefore proposed a new approach to incremental clustering based on a generic adaptation of the classical neural-based clustering methods relying on gas of neurons with free topology. This adaptation resulted in a description space independent and parameter-free, neural clustering technique using Hebbian learning and labeling expectation maximization instead of classical Euclidean or correlation distances. We have proved that our approach very significantly outperformed the existing ones in all experimental contexts, and specifically, in the case of highly time-varying data , , .
The WICRI Project aims to explore a new concept of "wikis network" with adaptive capabilities. In particular, one aim is to propose strategies for enriching the content of the wiki network by dynamic integration and exploitation of Web data. Hence, Web mining represents an important challenge for enhancing the dynamicity, the flexibility and the scope of such a network. On the one hand, this process is mandatory for assisting the potential contributors with elaborated and reliable redaction guidelines during the network construction phase. On the other hand, it is essential for supplying end-users with external information whose added value is to maintain significant relationships with the semantic context of the wiki network. Although the WICRI project is still in his launching phase, our preliminary prototype of "network of wikis" is already acting as an on-line collaborative research platform .
Theme:Clustering; Statistical Analysis; Textual data; Time-evolving data; Distributed data
Description:The McFIID project is a CPER project continuing the CPER CLASSIF project. It concerns the development of incremental multi-clustering techniques for managing distributed and evolving flows of textual data. New approach of diachronic analysis based on the use of multiple viewpoints combined with unsupervised bayesian reasoning, as well as new online incremental clustering techniques based on non standard similarity measures, are tested in the curse of these project.
Administrative context:CPER
Web site:
http://
Period:start 2007-01-01 / 2011-12-31
Contact:Jean-Charles Lamirel
Partner(s):INIST, LORIA
Theme:Discourse, Dialogue and Pragmatics; Logics for Natural Language and Knowledge Representation
Description:The goal of CCCP-Prosodie is to empirically investigate the functioning of online communities (such as Wikipedia), and particular to link their activities and their use of language (as recorded in such corpora as email exchanges, for example). The TALARIS team is involved in this project for three reasons: to provide Natural language processing tools, to design an annotation scheme capable of dealing with information from both the social sciences (sociology and economics) and the humanities (psychology and ergonomics), and to provide help with inference technology.
Administrative context:ANR CONTINT
Web site:
http://
Period:start 2008-01-12 / end 2011-31-06
Contact:Alexandre Denis
Partner(s):Institut Télécom, UTC Compiégne, UNSA (Univ. Nice Sophia-Antipolis), Univ. de Versailles St-Quentin
Theme:Corpus Linguistics, Semantic Annotation of Corpora. Discourse, Dialogue and Pragmatics; Natural Language Understanding and Knowledge Representation
Description:The PORT-MEDIA project is an ANR project that aims to collect linguistic data for multiple domains and to investigate the use of a high-level semantic representation for annotating dialogue corpora. TALARIS contributed to the high-level semantics specification for annotating the MEDIA corpus and to the development of tools for the manual annotation (e.g., ATOOL and SRL-Web Annotation) as well as to the development of the blackboard architecture for the automatic annotation of the MEDIA corpus. Additionnally, Talaris provided the automatic annotation of the whole corpus and its evaluation.
Administrative context:ANR CONTINT
Web site:
http://
Period:start 2009-03-01 / end 2012-03-01
Contact:Matthieu Quignard, Lina M. Rojas-Barahona
Partner(s):ELDA, LIG/GETALP, LIA, LIUM, LORIA
Theme:Computational Semantics
Description:The PASSAGE project has two main aims. The first is to improve the robustness and precision of existing computational grammars for French, and to use them on large corpora (corpora containing several million words). The second is to exploit the resulting syntactical analyzes to create richer linguistic resources (such as Treebanks) for the French language.
Administrative context:ANR MDCA
Web site:
http://
Period:start 2007-01-01 / end 2010-30-06
Contact:Claire Gardent
Partner(s):CEA-LIST, LIMSI, INRIA Rocquencourt, CNRS
Theme:Computational Semantics
Description:The Allegro project aims to develop NLP techniques that support language teaching for French and German.
Administrative context:INTERREG IV A
Web site:
http://
Period:start 2010-01-01 / end 2012-12-31
Contact:Claire Gardent
Partner(s):Saarbrücken University, Supelec Metz, INRIA Nancy Grand Est
Theme:Computational Semantics
Description:The EMOSPEECH project aims to augment serious games with natural language (spoken and written dialog) and emotional abilities (gesture, intonation, facial expressions).
Administrative context:Eurostars
Period:start 2010-09-01 / end 2013-08-31
Contact:Claire Gardent
Partner(s):Artefacto, Acapella, INRIA Nancy Grand Est
Theme:Multilinguality for Multimedia
Description:Metaverse is an exciting project whose goal is to provide a standardized global framework enabling the interoperability between virtual worlds (for example Second Life, World of Warcraft, IMVU, Active Worlds, Google Earth and many others) and the Real world (sensors, actuators, vision and rendering, social and welfare systems, banking, insurance, travel, real estate and many others).
Administrative context:ITEA2 07016
Web site:
http://
Period:start 2009-01-01 / end 2011-12-31
Contact:Samuel Cruz-Lara
Partner(s):Belgian partners: Alcatel-Lucent Bell N.V., Nazooka, IBBT-SMIT; French partners: Alcatel-Lucent France, Orange Labs, CEA List, Artefacto; Greek partners: Forthnew S.A., Ellinogermanki Agogi; Dutch partners: Philips Research, Philips I-Lab, DevLab, Technical University Eindhoven, University of Twente, Stg. EPN, VU Economics & BA, VU CAMeRA; Spanish partners: Innovalia, Ceeda, VirtualWare, CBT, Nextel, Corsa, Avantalia, I&IMS, VicomTECH, E-PYME, CIC Tour Game, UPF-MTH; Israeli partners: Metaverse Labs.
Theme:Multilinguality for Multimedia
Description:The goal of the SEMbySEM project is to develop a new open source supervision system adapted to the increasing complexity of “systems of systems”. This new supervisions system will be based on the extensive use of semantic technologies (notably ontologies). It will provide a set of tools allowing the set up of dedicated supervision systems according to the various stakeholders' needs and domain knowledge.
The TALARIS team's contribution to this project will center on providing language technology for developing, maintaining, and enriching ontologies and on developing ISO standards for multilingual user interfaces.
Administrative context:ITEA2 07021
Web site:
http://
Period:start 2008-07-31 / end 2010-12-31
Contact:Samuel Cruz-Lara
Partner(s):Finnish partners: Identoi, LogiNets, Oliotalo, VTT; French partners: Thales (Project Leader), ArcInformatique, CityPassenger, LISSI (Université de Paris 12), LIG (IMAG GRenoble); Spanish partners: Trimek, DataPixel, SQS, CBT, Innovalia; Turkish partners: AGM Lab, METU.
Theme:Logics for Natural Language and Knowledge Representation
Description:The main aim of the InToHyLo project is to investigate inference methods for hybrid logics, to develop highly optimized inference tools based on these methods, and to use these tools in natural language applications. Talaris and GLyC are currently leaders in automated theorem proving for hybrid logics, and they are the developers of the two provers HyLoRes (based on resolution) and HTab (based on tableaux). With the InToHyLo project we want to investigate how to combine resolution and tableaux algorithms to allow our provers to collaborate and share partial results. We will integrate our tools in a platform suitable for inference in NLP applications (focusing on Dialogue Systems and Textual Entailment). This platform will include not only tools for satisfiability testing, but also for model building, model checking, bisimulation checking, and knowledge maintenance and retrieval. Finally, we want to develop parallel inference algorithms to improve performance, and distributed testing to speed up developing.
Administrative context:INRIA (Equipes Associées)
Web site:
http://
Period:start 2009-01 / end 2012-01
Contact:Carlos Areces
Partner(s):Universidad de Buenos Aires, Argentina.
Luciana Benotti defended her PhD thesis at the Université Henri Poincaré entitled Implicature as an Interactive Processsupervised by Patrick Blackburn, on 28 January 2010.
Dimitri Sustretov defended his PhD thesis at the Université Henri Poincaré entitled Topological semantics for hybrid logicsupervised by Patrick Blackburn, on 9 July 2010.
Paul Bedaride defended his PhD thesis at the Université Henri Poincaré entitled Implication textuelle et réécrituresupervised by Claire Gardent, on 18 October 2010. He is now a Postdoc at Stuttgart University.
Guillaume Hoffman defended his PhD thesis at the Université Henri Poincaré entitled Taches de raisonnement en logiques hybridessupervised by Patrick Blackburn and Carlos Areces, on 13 December 2010.
Jean-Charles Lamirel defended his habilitation (HdR) at the Université Henri Poincaré entitled Vers un approche systémique et multivues pour l'analyse de données et la recherche d'information : un nouveau paradigmeon 6 December 2010 .
Carlos Areces:
Member of the Management Board of the Association of Logic, Language and Information (FoLLI), 2005-2010.
Patrick Blackburn
Liaison officer for the Erasmus MundusMasters in Language and Communication Technology.
Samuel Cruz-Lara
In charge, at the national level, of the reception of Mexican students in the “Professional Licences of Computer Science”.
Member of W3C's SYnchronized
MultiMedia Group
http://
Member of ISO's TC37 “Terminologies and other Language Resources” / SC4 “Linguistic Resources Management”. Project leader of the Multi Lingual Information Framework (ISO DIS 24616).
Christine Fay-Varnier
Vice president of the Council of studies and university life of the INPL.
Representative of the INPL for the steering committee TICE (Information and Communication Technology for Education) for Nancy University.
Claire Gardent
Member of the LORIA steering committee.
Coordinator of the TALC theme (Computational Linguistics and Computational Approaches to Knowledge) for the MISN CPER (National and Regional Research Funding).
Organiser of the LORIA TALC
seminar
http://
Local organiser for the NaTAL 2010
workshop
http://
Jean-Charles Lamirel:
Member of the Management Board of the Collnet international research group in Scientometrics/Informetrics/Webometrics, 2005-2010.
Fabienne Venant
Member of the Administrative
Council of ATALA, the French national organisation
for computational linguistics (see
http://
Carlos Areces
Member of the Editorial Board of the FoLLI Publications on Logic, Language, and Information (part of the Lecture Notes in Artificial Intelligence series published by Springer-Verlag). Since 2006.
Member of the Scientific Board of The Baltic International Yearbook of Cognition, Logic and Comunication. Since 2005
Member of the Editorial Board of the Journal of Logic, Language and Information. Since 2004.
Member of the Editorial Board of the Journal of Applied Logics. Since 2004.
Member of the Organizing Committee of the Workshop on NLP and Web-based technologiesheld in conjunction with IBERAMIA 2010, Bahía Blanca, Argentina.
Member of the Program Committee of the 36th Latin American Conference of Informatics, Asunción, Paraguay.
Member of the Program Committee of the 2010 Workshop on Hybrid Logics (HyLo 10), Edinburgh, United Kingdom.
Member of the Program Committee of the 1era Escuela de Lingüística Computacional (ELiC-1), Buenos Aires, Argentina.
Member of the Program Committee of the 2010 International Workshop on Description Logics (DL2010), Waterloo, Canada.
Member of the Program Committee of the International Joint Conference on Automated Reasoning (IJCAR10)Edinburgh, United Kingdom.
Member of the Program Committee of Advances in Modal Logic (AiML10)Moscu, Russia.
Patrick Blackburn
editor of Review of Symbolic Logic
editor of Notre Dame Journal of Formal Logic
editorial board of Logique et Analyse
subject editor (Logic and Language), Stanford Encyclopedia of Philosophy
Program committee of Advances in Modal Logic 2010 (AiML 2010)
Program committee of Hybrid Logic 2010 (HyLo 2010)
Program committee of Workshop on Theories of Information Dynamics and Interaction and their Application to Dialogue 2010 (TIDIAD@ESSLLI 10)
Nadia Bellalem
PC Member for ICEIS 2010 (The 12th International Conference on Enterprise Information Systems), Madeira, Portugal.
Samuel Cruz-Lara
PC member for KEOD 2010 (The International Conference on Knowledge Engineering and Ontology Development), Valencia, Spain.
Member of the Editorial Board of Revista Iberoamericana de Tecnologías del Aprendizaje
Claire Gardent
PC member for TALN 2010 (Traitement Automatique des Langues Naturelles) 2010, Montréal, Canada.
PC member for SemDial 2010 (14th Workshop on the Semantics and Pragmatics of Dialogue), Poznan, Poland.
PC member for RFIA 2010 (17ème Colloque Francophone sur la Reconnaissance des Formes et l'Intelligence Artificielle), Caen, France.
PC member for EMNLP 2010 (Conference on Empirical Methods in Natural Language Processing), MIT, USA.
PC member for LREC 2010 (The seventh international conference on Language Resources and Evaluation), Malta.
PC member for INLG 2010 (12th European Workshop on Natural Language Generation),Dublin, Irland.
PC member for ACL 2010 (Annual meeting of the Association for Computational Linguistics), Uppsala, Sweden.
Jean-Charles Lamirel
Member of editorial board of the
new international journal “COLLNET Journal of
Scientometrics and Information Management”, Taru
Publications, New Delhi, India (
http://
Reviewer for the Neural Networks, Geographical Information Systems and Collnet International Journals.
Program chair of VSST Technological and Strategic Survey Conference VSST 2010, Toulouse, France, October 2010.
Organizer of Special Session on Incremental clustering and novelty detection techniques and their application to intelligent analysis of time varying information in the framework of IEA/IAE International Conference, Syracuse, NY, USA, June 2011.
Co-organizer of the ECG Workshop: Clustering incrémental et méthodes de détection de nouveauté et leur application à l'analyse intelligente d'information évoluant au cours du temps, EGC 2011 Workshop, Brest, France, January 2011.
Christine Fay-Varnier
Member of the organizing committee of TICE 2010 conference (http://www.tice2010.nancy-universite.fr/)
Carlos Areces
Invited one week course at ELiC-1, Buenos Aires, Argentina. Invited one week course at NASSLLI 2010, Indiana, USA. Invited talk at JCC 2010, Rosario, Argentina.
Patrick Blackburn
M2 course “Mathematics for
Computer Science: Introduction to computability and
computational complexity”. 15 hours, Erasmus Mundus
Master “Language and Communication Technology”,
University of Nancy 2, France,
http://
M2 course “Discourse and
Dialogue”. 15 hours, Erasmus Mundus Master “Language
and Communication Technology”, University of Malta,
http://
Samuel Cruz-Lara
M2 course “Cognitive Sciences and Digital Media Technologies”. 22 hours, Cognitive Sciences, University of Nancy 2, France.
M2 course “Declarative Languages and Multimedia Applications”. 40 hours, Cognitive Sciences, University of Nancy 2, France.
M2 course “Video: Streaming and Captioning”. 12 hours, Cognitive Sciences, University of Nancy 2, France.
Claire Gardent
Software project tutoring “Error mining and Surface Realisation”, Erasmus Mundus Master “Language and Communication Technology”, Nancy 2.
Invited tutorial on “Natural language processing and computer aided language learning”, 4th intensive Summer School and Collaborative Franco-Thai Workshop on Natural Language Processing, Kasetsart University, Bangkok, Thailand.
Jean-Charles Lamirel
Master and PhD course on “Text Mining techniques applied to Linguistics: Introduction to the use of statistical methods for the analysis of literature. Case studies in French Literature. 20 hours, University of Alger.