Discourse Parallelism, Ellipsis, and Ambiguity

SIGNES Linguistic signs, grammar and meaning: computational logic for natural language SYM Christian Retoré Professor, Université Bordeaux 1 Corinne Brisset TR, INRIA – since july 2005 Brigitte Larue-Bourdon TR, INRIA – before july 2005 Christian Bassac Lecturer, Université Bordeaux 3 – on INRIA sabbatical since september 2005 Joan Busquets Lecturer, Université Bordeaux 3 Christian Clément Lecturer, Université Bordeaux 1 – hired in september 2005 Irène Durand Lecturer, Université Bordeaux 1 – since september 2005 Kim Gerdes Lecturer, Université Bordeaux 3 Patrick Henry IR C.N.R.S. LaBRI Gérard Huet DR INRIA, 50% project-team Cristal, 50% Signes Renaud Marlet CR INRIA Richard Moot CR C.N.R.S. , LaBRI Henri Portine Professor, Université Bordeaux 3 Olivier De Langhe Instructor, Institut national des Jeunes Sourds de Bordeaux Gradignan Alain Lecomte Professor, Université Grenoble 2 Yannick Le Nir Lecturer, EISTI Pau Maxime Amblard Ministry grant, Université Bordeaux 1 Houda Anoun Ministry grant, Université Bordeaux 1 Roberto Bonato Italian grant for cotutored PhD, Università di Verona and Université Bordeaux 1 Pierre Guitteny Interpret/Official representative of Direction Régionale des Affaires Sanitaires et SocialesAquitaine — Université Bordeaux 3 Emilie Voisin Aquitaine Regional grant, Université Bordeaux 3 – since september 2004

Joint team with LaBRI and the Department of Linguistics of Université Bordeaux 3 Michel de Montaigne — in particular with the research ministry Jeune Equipe JE 2385 TELANCO and the C.N.R.S. UMR 5610 ERSS.

LaBRI is a joint C.N.R.S. UMR 5800 team, involving Université Bordeaux 1, and the Ecole Nationale Supérieure d'Electronique, d'Informatique, et de Radiocommunications de Bordeaux, ENSEIRB.

ERSS is a joint C.N.R.S. team involving Université Toulouse-Le Mirail and Université Michel de Montaigne in Bordeaux

TELANCO is a team of Université Michel de Montaigne Jeune Equipe 2385 from the Research Ministry

Overall Objectives Overall Objectives

TheSignes team is addressing several domains of computational linguistics such as:Ê

flexional and derivational morphology

syntax

logical (or predicative) semantics

lexical semantics

discourse representation

by means of formal methods such as:

formal language theory

categorial grammars

resource logic

lambda calculus

higher order logic

Two applications illustrate this approach:Ê

natural language tools for Sanskrit

modelling of French Sign Language grammar

We also develop the corresponding computational linguistics tools. Ultimately these tools will result in a significant generic NLP platform encompassing analysis, generation and acquisition devices. Some specific languages will deserve particular attention, like Sanskrit, French Sign Language, French.

Scientific Foundations The center: natural language syntax and semantics computational linguistics natural language processing NLP formal languages logic

Since the early days of computer science, natural language is both one of its favorite applicative field and the source of technical inspiration, as exemplified by the relation between formal language theory and linguistics.

Nowadays, the motivation is the need to handle lots of digitalized textual and even spoken information, in particular on the Internet, but also interesting mathematical and computational questions raised by computational linguistics, which can lead to other applications.

Most common natural language tools are information retrieval systems, spell checkers, and in a lesser proportion, natural language generation, automatic summary, computer aided translation.

Statistical methods and corpus linguistics have been quite successful for the last years, but there is a renewal of symbolic methods, and especially of logical ones, because of the advances in logic, the improvement of computer abilities for these rather slow algorithms, and overall the need for systems which handle the meaning of phrases, sentences, or discourses.

For all these applications, like queries in natural language, refined information retrieval, natural language generation, or computer aided translation, we need to relate the syntax of an utterance to its meaning. This relation, known as the syntax/semantics interface and its automatization, is the center of this project. This notion is in general used for sentences, but we also work on the extension of this correspondence to discourse and dialogue.

The study of the interface between syntax and semantics makes way for interesting questions of a different nature:

As said above, this enables applications that require access and computation of meaning.

Up to now semantics only plays a minor role in Natural Language Processing although a linguistic viewpoint, the two sides of the linguistic signs its signifiantand signifiéare a central subject ever since Saussure. The linking of the observable part of the sign or of the sentence and its meaning, is a constant question in linguistics both in Chomsky's Generative Grammar or in the Meaning-Text theory of Mel'cuk. ,

From a mathematical and algorithmic viewpoint, this interface is the place of some challenges: what is the link between two of the main frameworks, namely generative grammars and categorial grammars? The first ones are exemplified by Tree Adjoining Grammars TAGs or Minimalist Grammars . They enjoy efficient parsing algorithms and a broad covering of syntactic constructs. The second ones (see e.g. ) are less efficient but provide more acurate analyses. Indeed these latter systems are used for syntax as well as for logical or predicative semantics like Montague semantics , and thus allows generation algorithms. Other models, like dependency grammars, provide a different account of the syntax/semantics interface. A comparison between the dependency model and a generative/logical one enables an assessment of the adequation of these families of models, and this is one of the main challenges of contemporary formal linguistics.

At one end of our spectrum stands morphology, and as often in generative grammar, we consider it as part of syntax. It should be nevertheless observed that the computational models involved in the processing of morphology are of different aspects : finite state automata, regular tranducers, etc. ,

At the other end, on the semantical side, we do not consider ontological aspects of semantics, or lexical semantics, but rather extend the logical semantics to discourse and dialog. This is usually done by Discourse Representation Theory , which is topdown, incremental and involves state changes.

Word structure and automata: computational morphology Gérard Huet Kim Gerdes finite state automata transducers morphology

Computational models for phonology and morphology are a traditional application of finite state technology. , , , These models often combine symbolic or logical systems, like rewriting systems, and statistical methods like probabilistic automata which can be learnt from corpus by Hidden Markov Models.

Morphology is described by means of regular transducers and regular relations, and lexical data bases, as well as tables of phonological and morphological rules are compiled or interpreted by algebraic operations on automata.

The existing techniques for compiling such machinery are rather confidential, while any naive approach leads to a combinatorial explosion. When transformation rules are local, it is possible to compile them into an invertible transducer directly obtained from the tree which encodes the lexicon.

A generic notion of sharing allows to have compact representation of such automata. Gérard Huet has implemented a toolkit based on this technique, which allows a very efficient automatical segmentation of a continuous phonologic text.

This study of the linear structure of language and of word structures is by itself sufficient for applications like orthographic correctors and text mining. Furthermore, this preprocessing is required for the analysis of other layers of natural language like syntax, semantics, pragmatics, etc.

Sentence structure and formal grammars: syntax Maxime Amblard Roberto Bonato Kim Gerdes Alain Lecomte Renaud Marlet Richard Moot Christian Retoré formal grammars categorial grammars tree adjoining grammars dependency grammars

While linear structure is in general sufficient for morphological structure, trees are needed to depict phrasal structure, and, in particular, sentence structure. Different families of syntactic models are studied in Signes: rewriting systems of the Chomsky hierarchy, including tree grammars, and deductive systems, i.e. categorial grammars.

The former grammars, rewrite systems, have excellent computational properties and quite a good descriptive adequacy. Relevant classes of grammars for natural language syntax, the so-called mildly context sensitive languages, are just a bit beyond context-free languages, and they hare parsable in polynomial time as well. Among theses classes of grammars let us mention Tree Adjoining Grammars, , , Minimalist Grammars. , , — Dependency Grammars share some properties with them but the general paradigm is quite different , .

Edward Stabler introduced Minimalist Grammars (MGs) as a formalization of the most recent model of the Chomskian or generative tradition and they are quite appealing to us. They offer a uniform model for the syntax of all human languages.

There are two universal, language independent, rules, called mergeand move: they respectively manage combination of phrases and movement of phrases (or of smaller units, like heads).

Next, a language is defined by a (language dependent) lexicon which provides words with features describing their syntactic behavior: some features trigger mergeand some others move. Indeed, features have positive and negative variants which must cancel each other during the derivation (this is rather close to resource logics and categorial grammars).

Consequently they are able to describe numerous syntactic constructs, providing the analyzed sentences with a fine grained and complete syntactic structure. The richer the syntactic structure is, the easier it is to compute a semantic representation of the sentence.

They also cover phenomena which go beyond syntax, namely they include morphology via flexional categories, and they also incorporate some semantic phenomena like relations between pronouns and their possible antecedents, quantifiers, etc.

A drawback of rewrite systems, including minimalist grammars, is that they do not allow for learning algorithms which could automatically construct or enlarge grammars from structured corpuses. But their main drawback comes from the absence of structure on terminals, which gives no hint about the predicative structure of the sentence.

Indeed, a strong reason for using categorial grammars, despite their poor computational properties, and poor linguistic coverage, is that they provide a correspondence bewteeen syntactic analyses and semantic representations. This is to be explained in the next section on the syntax/semantics interface.

In order to improve the computational properties of categorial grammars, and to extend their scope, one can try to connect them to more efficient and wider formalisms, like minimalist grammars. , ,

Sentence structure and logic: the syntax/semantics interface Maxime Amblard Roberto Bonato Joan Busquets Alain Lecomte Renaud Marlet Richard Moot Christian Retoré categorial grammars Montague semantics computational semantics

Why does there exists a simple and computable correspondence between syntax and semantics in categorial grammars? This is mainly due to the internal functional structure of non-terminals in categorial grammars, which yields a correspondence with semantic formulae and functions. This correspondence between syntactic and semantic categories extend to terms, or analyses because the usual logic in use for typed lambda-calculus is an extension of the resource logic used for syntactic deductions or analyses. ,

Nevertheless this computational correspondence between syntax and semantics provided by categorial grammars is very limited. Firstly, for the correspondence between syntactic and semantic types to hold, we have to provide words with syntactic types which are ad hoc, and even wrong. For instance, why should the type of a determiner depend of the constituent it is involved with? Secondly, the truth-conditional aspect of Montague semantics can be discussed both from a theoretical and from a practical viewpoint. According to cognitive sciences, and even to common sense, it is unlikely that human beings develop all possible interpretations when they process and understand a sentence, and in practice such a construction of all models is definitely untractable. Thirdly, a strict compositional principle does not hold, as the famous Geach examples shows.

In this project we address the first issue, which is a real limit, and the third one, in the next section on discourse. The first point is one of the motivations for studying the syntax/semantics interface for minimalist grammars. Indeed, they are rather close to categorial grammars and resource logic, and using this similarity we are able to extend the correspondence to a much richer grammatical formalism, without having strange syntactic types. ,

Lexical semantics and derivational morphology Christian Bassac Patrick Henry Renaud Marlet lexical semantics computational semantics

The generative lexicon is a way to represent the internal structure of the meaning of words and morphemes. Hence it is relevant not to say mandatory for computing the semantic counterpart of morphological operations. The information which depicts the sense of a word or morpheme is organized in three layers: the argument structure (related to logical semantics and syntax), the event structure, and the qualia structure.

The argument structure provides types (in the type-theoretical sense) to the arguments encoded in the qualia structure no matter whether they are syntactically mandatory or optional. The event structure follows . It unfolds an event into several ordered sub-events with a mark on the most salient sub-event. Events are typed according to the typology of Vendler: state, process, transition, this later type including achievement and accomplishment. The qualia structure relates the argument structure and the event structure in roles: formal, constitutive, telic, agentive.

This information and its organization into the generative lexicons allows an explanation of, for instance, polysemy and of compositionality (in particular in compound words). This kind of model which relates knowledge representation to linguistic organization is especially useful for word sense disambiguation during (automatic) syntactic and semantic analysis.

Discourse and dialogue structure: computational semantics and pragmatics Agnès Bracke Joan Busquets Gérard Huet Alain Lecomte Henri Portine Montague semantics DRT computational semantics

Montague semantics has some limits. Two of them which, technically speaking, concern the context, can be overcome by using DRT, that is Discourse Representation Theory and its variants. , Firstly, if one wants to construct the semantics of a piece of text, one has to take into account sequences of sentences, either discourse or dialogue, and to handle the context which is incrementally defined by the text. Secondly, some constructs do not obey the strict compositionally of Montage semantics, since pronouns can refer to bound variables. For instance a pronoun of the main clause can be bound in a conditional sub-clause.

For these reasons, Discourse Representation Theory was introduced. This model defines an incremental view of the construction of discourse semantics. As opposed to Montague semantics, this construction is top-down, and proceeds more like state change than like functional application — although lambda-DRT present DRT in a Montague style, see e.g. .

Type systems and functional programming for computational linguistics Roberto Bonato Gérard Huet Yannick Le Nir Richard Moot functional programming proof assistant logic programming type theory

The team has developed competences in logic, lambda-calculus. These models are commonly used in computational linguistics :

An example is categorial grammars, with their parsing-as-deduction paradigm, which use proofs in Lambek calculus or linear logic as syntactic trees.

Another example is Montague semantics which uses the Church description of higher-order logic, implemented in lambda calculus in order to have the compositionality principle of Frege.

Finally, Discourse Representation Theory also is logic, in a different syntax, and can be combined with Montague semantics to obtain lambda-DRT.

Consequently it is quite natural to develop tools in programming languages relying on logic and type theory:

The Grail syntactic and semantic parser for Multi Modal Categorial grammars, defined and implemented by Richard Moot, is written in Prolog. This is the most developed and efficient software for categorial grammars, relying on recent development in linear logic, in particular proof nets.

Under the supervision of Yannick Le Nir and Christian Retoré, a team of students implemented in OCaML the first steps of a platform for parsing and learning categorial grammars and related formalisms.

Gérard Huet developped a toolkit for morphology, the Zen toolkit, using finite state technology, in OCaML. He obtained excellent performances, thus proving the relevance of purefunctional programming for computational linguistics.

Application Domains Sanskrit philology Gérard Huet Sanskrit natural language processing Indian studies Internet

Sanskrit literature is extremely rich, and is part of the world cultural patrimony. Nowadays, Internet can provide to both specialists and inquiring minds an access to it.

This kind of resource already exists for ancient Greek and Latin literature. For instance, Perseus ( http://www.perseus.tufts.edu) provides an online access to texts. A simple click on each word analyses it, and brings back the lexical item of the dictionary, possible meanings, statistics on its use, etc.

The work described in the following sections enables such computational tools for Sanskrit, some of which are already developed and made available on a web site ( http://sanskrit.inria.fr). These tools efficiently and accurately assist the annotation of Sanskrit texts. Besides, a tree bank of Sanskrit examples also is under construction. When the literature is annotated, this work will ultimately lead to a Sanskrit analogous of Perseus.

Towards French Sign Language (LSF) modelling and processing Olivier De Langhe Pierre Guitteny Renaud Marlet Henri Portine Christian Retoré Emilie Voisin sign language deaf community disabled multimedia communication

After a mundial prohibition decided in 1880 (and which lasted untill the sixties in the USA and untill the eighties in France) Sign Languages, deaf people can use sign language and rather recently ithese languages are the object of new studies and development: a first aspect is social acknowledgment of sign language and of the deaf community, a second aspect is linguistic study of this language with a different modality (visual and gestural as opposed to auditive and phonemic) and the third and most recent aspect which relies on the second, is the need for sign language processing. A first goal is computer aided learning of Sign Language for hearing people and even deaf people without access to sign language. A more challenging objectives would be computer aided translation from or to sign language, or direct communication in sign language.

Given the rarity of linguistic study on the syntax and semantics of sign languages — some exceptions concerning American Sign Language are , , — before to be able to apply our methodology, our first task is to determine what the structure of the sentence is, using our personal competence as well as our relationship with the deaf community.

We intend to define methods and tools for generation of sign language sentences. It should be noted that there is a sequence of different representations of a sentence in Sign Language, from a grammatical description with agreement features and word/sign order that we are familiar with, to a notation system like Signwriting or to a language for the synthesis of 3D images and movies. Our competences on the interface between syntax and semantics are well designed for a work in generation of the grammatical representations.

A first application would be a software for teaching Sign Language, like the CD ROM Les Signes de Manoby IBM and IVT. Indeed, presently, only dictionnaries are available on computers, or examples of sign language videos, but no interactive software. Our generation tools, once developed, could be useful to educative purposes.

Software The Zen toolkit Gérard Huet correspondant natural language processing segmentation computational morphology finite state technology functional programming

This software has been devopped by Gérard Huet for many years, initally in the project-team Cristaland it is clearly the most significant software presented in Signes.

It is a generic toolkit extracted by Gérard Huet from his Sanskrit modeling platform allowing the construction of lexicons, the computation of morphological derivatives and flexed forms, and the segmentation analysis of phonetic streams modulo euphony. This little library of finite state automata and transducers, called Zen for its simplicity, was implemented in an applicative kernel of Objective Caml, called Pidgin ML. A literate programmingstyle of documentation, using the program annotation tool Ocamlweb of Jean-Christophe Filliâtre, is available for Ocaml. The Zen toolkit is distributed as free software (under the GPL licence) in the Objective Caml Hump site. This development forms a significant symbolic manipulation software package within pure functional programming, which shows the faisability of developing in the Ocaml system symbolic applications having good time and space performance, within a purely applicative methodology.

A number of uses of this platform outside of the Cristal team are under way. For instance, a lexicon of french flexed forms has been implemented by Nicolas Barth and Sylvain Pogodalla, in the Calligramme project-team at Loria. It is also used by Talana (University of Paris 7).

The algorithmic principles of the Zen library, based on the linear contexts datastructure (`zippers') and on the sharing functor (associative memory server), were presented as an invited lecture at the symposium Practical Aspects of Declarative Languages (PADL), New Orleans, Jan. 2003 . An extended version was written as a chapter of the book ``Thirty Five Years of Automating Mathematics'', edited in honor of N. de Bruijn .

Sanskrit Site Gérard Huet correspondant Sanskrit electronic dictionary tagging segmentation

Gérard Huet's Sanskrit Site ( http://sanskrit.inria.fr) provides a unique range of interactive resources concerning Sanskrit philology . These resources are built upon, among other ingredients, the Zen Toolkit (see above). The site registers thousands of visitors monthly.

The declension enginegives the declension tables for Sanskrit substantives.

The conjugation engineconjugates verbs for the various tenses and modes.

The lemmatizertags inflected words.

A dictionarylists inflected forms of Sanskrit words. Full lists of inflected forms, in XML format (given with a specific DTD), are released as free linguistic resources available for research purposes. This database, developed in collaboration with Pr. Peter Scharf, from the Classics Department at Brown University, has been used for research experiments by the team of Pr. Stuart Shieber, at Harvard University.

The Sanskrit Readersegments simple sentences, where the (optional) finite verb form occurs in final position. This reader enhances the hand-tagged Sanskrit reader developed by Peter Scharf, that allows students to read simple texts differently: firstly in davanagari writing, then word-to-word, then in a word-to-word translation, then in a sentence-to-sentence translation.

The Sanskrit Parsereliminates many irrelevant pseudo-solutions (segmentations) listed by the Sanskrit reader.

The Sanskrit Taggeris an assistant for the tagging of a Sanskrit corpus. Given a sentence, the user chooses among different possible interpretations listed by the morpho-syntactic tools and may save the corresponding unambiguously tagged sentence on disk. The process is as follows. The user on his client machine types in a sentence, calls remotely the parser, inspects the small number of surviving taggings, then may inspect each one in order to peruse the semantic analysis, presented as a pseudo-English paraphrase. Some non-determinism may remain — typically, a given segment may be lemmatized in several ways, either by homonymy, or by morphological ambiguity. Each path in the semantic dependency matrix is shown with its bonus-malus, and the user may select the one he prefers, yielding a completely disambiguated analysis which he may then store on his client machine, as an hypertext document indexing in the Sanskrit Heritage Dictionary (our structured lexical database). This service has no equivalent worldwide.

Another on-going project is the construction of a tree bank of Sanskrit examples, in collaboration with Pr. Brendan Gillon, from McGill University in Montreal.

Grail 3: natural language analysis with multimodal categorial grammar Richard Moot correspondant parsing syntactic analysis semantic analysis logic programming

Within the type-logical grammar paradigm, Multi-Modal Categorial Grammars (MMCG, see e.g. ) are one of the richest approach. Richard Moot carefully implemented Grail, an analyzer for MMCG that is the most complete system for natural language analysis based on type logical grammars with lexicon/grammars. Several languages are supported (although with different levels of linguistics coverage): dutch, english, french, italian, hindi. Grail is distributed under Gnu LGPL .

The Grail parser/theorem prover for categorial grammars, originally developed at the University of Utrecht, has been rewritten from scratch, taking into account modern insights about proof nets as well as requiring only open-source software to run. This new release also includes computational theoretical improvement in accordance with : parallel use of structural postulates (which introduce flexibility for word order, tree structure etc.) and degree of preference in order to improve the complexity of the analysis due to the exponential number of choices. The parser has also been adapted to allow for a tight integration with the supertagger . Also, several new strategies for reducing the search space have been implemented, significantly improving parsing performance.

DepLin Kim Gerdes correspondant natural language syntactic analysis and generation

DepLin takes a syntactic dependency tree as the input. The topological grammar translates such an (unordered) tree to an ordered constituent tree, called topological tree. In the following step, this tree is simplified to a three level prosodic constituent tree (prosodic words, prosodic phrases, prosodic sentences). From this tree, a very simple sound output device can concatenate prerecorded sound files corresponding to the different prosodic words (with their prosodic markup). This allows for auditory tests of the resulting sentences in constructed communicative contexts (question-answer sets). The construction of the prerecorded files is quite time consuming; it has been tested on small vocabulary of Modern Greek.

DepLin was developed by Kim Gerdes. It is distributed as free software (GPL) and, apart from our internal usage at the Signes group (in particular for German and Greek), is mainly used at the University of Paris 7 for the development of different grammars (in particular Arabic and French).

Corpus Arborator Kim Gerdes correspondant editor corpus annotation functional dependency

An editor for corpora with functional dependency annotation was developed by Kim Gerdes in collaboration with the ERSS, Toulouse. This ``corpus arborator'' is distributed under the GPL and used in Bordeaux and ERSS Toulouse.

LeFFF Lionel Clément correspondant lexicon inflected form French lemma morphological features

LeFFF (Lexique des Formes Fléchies du Français) offers, under the LGPL For Linguistic Resources, a wide-coverage lexicon of inflected forms for French, which associates to each form its lemma and its morphological features (other features are under construction). It has been developed by Lionel Clément, Benoît Sagot and Bernard Lang. Its available at http://www.lefff.net/. This resource co-developed by Lionel Clément (before he joined the SIGNES group).

XLFG Lionel Clément correspondant parser Lexical Functional Grammar LFG

XLFG is a parser prototype for research. It implements the Lexical Functional Grammar (LFG) formalism. It used for teaching in various universities. It is distributed as free software ( http://dept-info.labri.fr/~clement/xlfg/). It has been developed by Lionel Clément (before he joined the SIGNES group).

Lexed Lionel Clément correspondant lexicaliser dictionary search

Lexed is a lexicaliser. It allows to search a dictionary entry from a string. The finite automata-based algorithm is particularly fast, and offers a good alternative to hashes for large dictionnaries. Lexed is distributed for unix platforms with a GPL Licence. This software has been developed by Lionel Clément (before he joined the SIGNES group).

Yab Lionel Clément correspondant compiler compiler parsing ambiguities parsing sharing

Yab is a compiler compiler similar to YACC. With Yab it is possible to deal with ambiguities and share semantic constructions beetween different analyses. Yab is distributed with a GPL licence. This software has been developed by Lionel Clément (before he joined the SIGNES group).

Tokenizer Lionel Clément correspondant text segmentation ambiguity compound words

This is a software allowing to segment a text in tokens. Ambiguity between simple and compound words is represented through a direct acyclic graph (DAG). This software has been developed by Lionel Clément (before he joined the SIGNES group) and is part of Lexed (see above).

Tree-drawing package Maxime Amblard correspondant tree drawing Minimalist Grammars

Maxime Amblard developed a tree-drawing package in ML. This package is included as a contribution in the open-source parser for Minimalist Grammars developed and distributed by John Hale ( http://www.linguistics.ucla.edu/people/stabler/hale/index.html).

Experiments in categorial grammars Roberto Bonato Richard Moot correspondant Christian Retoré parsing grammatical inference

This software, CGToolsis an academic prototype. It is the combination of two Travaux d'Etude et de Rechercheof 4 ^{t h}year students: Véronique Moriceau et Jérôme Pasquier (Université de Nantes, 2002) which has been reorganized and extended by Thomas Poussevin, Jean-François Deverge, Fahd Haiti, Anthony Herbé (Université Bordeaux 1, 2003). It is written in OCaML, with an interface written in Tcl/Tk and the input and output format are XML files (DAGs for representing analyses, proofs and trees).

Presently, the following algorithms are implemented:

learning of categorial grammars from structured sentences;

inter-translation in any possible direction between AB categorial grammars, Lambek grammars, context-free grammars in Greibach normal form, and context-free grammars in Chomsky normal form;

parsing of categorial grammars by proof search;

parsing of context-free grammars with the Cocke-Kasami-Younger algorithm.

New Results Segmentation and Flexional morphology

Gérard Huet continued his work on developing a computational linguistics platform adapted to Sanskrit, based on applicative programming in Ocaml.

The main effort in 2005 concerned curbing the overgeneration of the segmenter by a semantic analysis. Each segmentation solution, represented as a list of morphological items (inflected words tagged with their lemmatization as a root entry together with a morphological generator carring its various features), is translated into a sequence of semantic role scripts. Verbal forms become sites of actions/situations, expecting complements as role assignments. These roles depend on the regime of the verb, given its voice. For instance, a transitive verb in the active voice demands a subject in the nominative and an object in the accusative for its role saturation. Dually, nominal phrases provide the corresponding roles. Matching opposite polarities gives rise to a constraint satisfaction problem over the role features. This corresponds, in Western linguistics, to the construction of the dependency structurein the sense of Tesnière, as computed in computational systems based on dependency grammars (and having their analogues in feature logical programming platforms such as HPSG or LFG). In the terminology of Indian linguists like Pāṇini, we do the analysis of kaarakas. The constraint satisfaction problem is similar to proper typing of categorial grammar parse trees, or to the construction of a proof net in commutative linear logic. However, non-linear phenomena are frequent. For instance, agreement of an adjective and its qualifying noun is a kind of contraction.

The constraint satisfaction engine proceeds as a sequence of stream processors applied to the tagged sentence stream, going from right to left. Tool words are treated as postfix stream combinators - they are allowed to compute only in the past of their utterance. From this work arises the notion of a linguistic toolas a feature structure stream transducer. Pronouns are linguistic tools in this sense, since their purpose is to link to their anaphoric antecedent. In Sanskrit, a case study for coordination led to the implementation of the catool. This postfix conjunction has the effect of merging antecedent noun phrases with three semantic upper bound operations, respectively for gender, number, and person. For instance, it has the effect of transducing the sequence of tagged items for ``two girls and one boy'' into one tag for ``several male persons'', paving the way to the proper recognition of this compound item as a proper subject to a verb conjugated in the plural. This iteration of stream combinators computes a compound bonus-malus score.

The constraint engine, still under design, demonstrates a remarkable filtering capacity. Very often, sentences with several hundred potential phonemic segmentations are processed successfully, in the sense that most segmentation candidates are rejected as dubious, their bonus-malus score being below some threshold, while the intended meaning is retained. Rejection scores of 98% are frequent. This is rather encouraging, and it is expected that by December 2005 the prototype system will be released as a Sanskrit corpus tagging assistant. This application is entirely distributed as a Web service. The user on his client machine types in a sentence, calls remotely the parser, inspects the small number of surviving taggings, then may inspect each one in order to peruse the semantic analysis, presented as a pseudo-English paraphrase. Some non-determinism may remain - typically, a given segment may be lemmatized in several ways, either by homonymy, or by morphological ambiguity. Each path in the semantic dependency matrix is shown with its bonus-malus, and the user may select the one he prefers, yielding a completely disambiguated analysis which he may then store on his client machine, as an hypertext document indexing in the Sanskrit Heritage Dictionary (our structured lexical database). This service has no equivalent worldwide.

This Sanskrit platform was presented at the ATALA workshop on "Traitement automatique des langues anciennes", on May 21st in Paris. It was the topic of an invited lecture at the 5th International Conference on Logical Aspects of Computational Linguistics (LACL 2005) in Bordeaux on April 28th.

A new applicative model for finite state machines

This work builds on the Zen toolkit for lexical processing designed by the author, and distributed as a free software Ocaml library. It investigates a notion of mixed automaton or aum, first presented in 2003 in his Automata Mista article for the Manna Festschrift. This work is being pursued as a general model for the modular construction of finite state machines, possibly non-deterministic, and possibly transducing their input on an output tape, in a purely applicative inductive data type whose operations model constructions of regular relations.

This year a new generic layer was abstracted for compiling control for the reactive engine, implementing an original notion of modular transducer. The user provides a system of regular expression over phases, as well as specific aum recognizers for each phase. A meta-programming tool, implementing the Berry-Sethi algorithm for regular expression compiling, yields a sequential dispatchertailored to the specific application, as a stand-alone ML module, linked as a plug-in to the generic Zen toolkit. This was the topic of the summer internship of Benoît Razet for his 2nd year Master project at University Paris 6 . This work lead to the release of version 2 of the Zen computational linguistics toolkit, as a free software Pidgin ML library. A joint article on the design of modular tranducers has been submitted for publication .

Topological syntax

Kim Gerdes, Sylvain Kahane (University of Paris 10) and Hi-Yon Yoo (University of Paris 7) discussed the implications of the replacement of the classical ``morphologica'' structure of the Meaning-Text Framework with the topological constituent tree . They showed that two types of topological structures are frequently found: rather descriptive structures with multiple embeddings, and flattened out structures that form the templates that are actually used in the language production. The variety of flattened out structures can then be explained as a combination of different embeddings of simpler structures. The theoretical question remaining for the integration of these structures in the (linear) Meaning-Text Model is which of these structures actually appears as the intermediate representation between syntax and phonology.

Calling German a ``V2'' language is a simplification. In many cases, it is possible to place two constituents before the finite verb. The reasons to do so seem to depend on the semantic and the communicative structure of the sentence, and very little on the syntactic functions of the elements. Kim Gerdes showed that this apparent contradiction with the Meaning-Text modularity separating semantics from topology can be resolved by exploring the power of the communicative markup on the syntactic dependency tree .

Categorial syntax: super-tagging and optimised parsing

On the basis of the Spoken Dutch Corpus (CGN, a database containing syntactic annotations for a million of words in contemporary spoken Dutch), Richard Moot experimented with several strategies for automatically extracting, at different levels of detail, a type-logical treebank representing a lexicon for categorial grammars .

The size of the extracted lexicons, with an average of around 50 different formulas possible for each word in a sentence, poses a considerable challenge for parsing using the extracted grammars. By adapting methods used for Part-of-Speech tagging (notably maximum-entropy models, which currently outperform other models) to these much richer lexical items, an approach called supertagging, it is possible to find the most likely sequences of lexical lookups for a sentence. Depending on the level of detail maintained in the lexicon, the number of different formulas varies between 1000 and 7000, whereas the correctness of supertag disambiguation varies between 72 and 80%, which is comparable to results obtained TAGs using the (presumably cleaner) Penn Treebank.

Modelling French Syntax

The types in a categorial grammar form a hierarchy (using only the derivability relation between them). This hierarchy can be exploited to treat different linguistic phenomena such as French object clitics, even with clitic climbing and to correctly compute semantic representations in Montague style no matter whether control phenomena occur.

Henri Portine showed that the problem of relative clauses in daily French use is often blurred by the blending of two problems, namely that of the existence of an object in a corpus and the way it is possibly recoverable according to the properties of the corpus on the one hand, and that of the duality between relative clauses and complement clauses on the other hand . He also showed that the analysis of relative clauses in daily French use, identified as relative clauses which in fact would be complement clauses, is based on a conception of syntax as pure machinery. He proposed an analysis of this type of relative clauses, which opens up on a notional conception of the antecedents.

Deductive Grammars within a Proof Assistant

Houda Anoun is extending her implementation of categorial grammars in Coq to categorial minimalist grammars. This interactive proof search (i.e. parsing) enables to test and explore the properties of several variants of these mixed grammars proposed by Lecomte, Retoré, Vermaat.

Minimalist syntax

Maxime Amblard proved the existence of a Minimalist Grammar which generate the counting dependencies languages $Im1 ${L_m={1^n2^n\#8943 m^n,n\#8712 \#8469 }}$$ . He also presented an algorithm for the construction of the lexicon Lex _mproducing these languages . This class of languages, which models sentences such as ``Peter, Mary and Charles had respectively 14, 12 and 6 in math, history and sport'', belongs to the context-sensitive languages in the hierarchy of Chomsky. This result is a generalization to any $Im2 ${n\#8712 \#8469 }$$ of the Stabler presentation with n= 5 . It also generalizes the similar results of by providing a simpler grammar and handling such nested counter languages.

On the other hand our team also provides a criticism of grammars with movement. Many concepts like ``movement'', ``scrambling'', ``gapping'', ``right node raising'', etc., have their origin in the choice of constituent structures for the representation of syntax. Kim Gerdes explored the historical work on the development of X-bar phrase structures as the central syntactic representation. He showed how this choice came into being and how it persisted against all successful implementations of simple alternatives .

Syntax semantics interface for generative grammars

There are many ambiguities with quantifier scopes in natural languages. The different possible readings of a sentence can be expressed with CLLS (Constraint Language for Lambda Structures), that modelises underspecified lambda-terms. Given a syntactic analysis with Minimalist Categorial Grammars, Amblard described how to extract relevant semantic representations with CLLS , discarding spurious cases.

Roberto Bonato has defined an incremental algorithm for computing the binding relationship bewteen words and especially when the bound term is a pronoun (possible or impossible coreference with its antecedent). Up to now there was no incremental computing of this relation, which was defined as a set of constraints on a complete analysis. He now also explores alternative interpretations of the traditional Principles of Binding Theory, with special attention devoted to Reinhart's 1983 work Anaphora and Semantic Interpretationand Reinhart and Reuland's 1993 Reflexivity. He integrated such different approaches into a unified computational framework that looks very promising in deriving from general computational principles some of the major stipulations of these approaches stemmed from the last 30 years of linguistic and formal semantics tradition .

Lexical semantics: explanatory accounts

Christian Bassac showed with Pierrette Bouillon in that the availability of various types of anaphoric reference (via a definite determiner NP, a possessive determiner NP or a demonstrative determiner) to the modifier in N1 modN2 headcompounds is predictable according to the type of the relationship R that holds between N1 and N2 and the role it is encoded in. The fact that no misalignment could be found in the data of the three languages considered (English, French and Turkish) tends to show that the predictions made are articulated on deep-rooted aspect of the semantics of compound (they are so to speak qualia-driven) and can probably be generalized to other languages.

Christian Bassac defended a strong conception of compositionality for English root compounds and showed that analyses that have prevailed so far such as Downing's — these analyses plead for a completely unconstrained and unpredictable meaning of root compounds — are both over pessimistic and linguistically poorly motivated .

Christian Bassac analysed with Mehmet Ciçek the morphology of Turkish verbal and nominal predication to show that they are not opposed but both integrate a copula, which is sometimes manifested only by second articulation phenomena such as word stress . The results of this contribution challenge the claims of Pollock's theory of functional heads and plead for lexical rules to build the highly complex verbal forms of Turkish.

Henri Portine shed light on the discrepancy between the couple polysemy/homonymy considered from a diachronic point of view and the same couple as a cognitive fact . He showed that polysemy is a chain of relations, and showed too that cognitive homonymy is based on the breaking of this chain, which is evidence of its radical difference from diachronic homonymy, which is the naming of the absence of a relation. From a cognitive point of view, the couple polysemy/homonymy is relevant in lexical semantics.

Lexical semantics: formalisation of the generative lexicon

Most works on the Generative Lexicon (GL) are informal, leading to results that are more descriptive than apt to automation. A working group in SIGNES (C. Bassac, P. Henry, R. Marlet, C. Rétoré, as well as J. Vanier, an intern from the Ecole Centrale de Paris) has started a foundational effort to formalize GL. The goal is, given a parsed sentence, to construct possible interpretations in the form of logical formulas along the lines of Montague semantics but focusing on lexical information. A master thesis has being written along these lines but no article has been submitted yet.

The entries of GL have been formalized, with attention to variable binding and typing. This also includes role qualification for variables, dotted types, as well as subentries for the "telic" quale ("trigger" and "result" features). The type hierarchy and the set of primitive predicates are not fixed in the formalization: they are considered as parameters, to be defined along with any given lexicon instance.

A general framework for constituant composition has been defined and the main generative mechanisms, such as coercion and co-composition, have been specified as formal algorithms. The composition mechanisms have also been extended to depend on a semantic distance between predicates, enabling combination modulo predicate similitude as well as ranking between different interpretations.

Another facet of this formalization work concerns how to abstract semantic issues that are irrelevant to GL, such as anaphora resolution or quantification originating from determiners, and how to nonetheless recover this information in the final formulas. This abstraction cleans up the syntactic representation of the sentence, only keeping simple word associations, to be used as input to the GL combination mechanisms. This leaves the focus on what GL is good at, i.e., to define how words associate to construct new meanings.

Syntax, semantics, discourse: VP ellipsis

Joan Busquets has been working on vp-Ellipsis and his different semantic-discourse constraints. A comparative analysis between vp-Ellipsis and Stripping in Catalan and English shows that both types of constructions need to be clearly distinguished in Catalan, as it is in English. This evidence will come from the analysis of the so-called information packaging. On the one hand, Stripping constructions are under the control of focus by means of parallel foci. On the other hand, vp-Ellipsis constructions are not constrained by the information packaging, although this notion might help to disambiguate the target in certain cases. These results are found in . A more fine-grained analysis with some anaphoric discourse properties for both constructions will be at issue in the final published version .

The anaphoric properties of the Catalan expression fer-ho(do it) in elliptical contexts has been explored from a semantic and discourse point of view. We describe the set of semantic constraints that the form fer-hoimposes to the complements which it substitutes. Moreover, we provide relevant linguistic examples to analyze this contexts as narrow ellipsis, opposed to vp- ellipsis as wide ellipsis. .

Finally, the interaction among negation, vp-Ellipsis, and presupposition has been considered from a dynamic discourse semantics approach ( Segmented Discourse Representation Theory). By means of a set of constraints related to the Contrastdiscourse relation, we are able to explain the difference between factitive and non-factitive verbs in elliptical contexts when the negation is the unique remnantin the elliptical or target proposition .

Formal semantics of vague predicates

Most natural language quantifiers are vague, e.g. in French : ``quelques, peu, un peu, beaucoup, certains''. Moreover they suggest different kinds of inference : ``logical'' consequences and implicatures, in the gricean sense. Using the Logic of Partial Information, Areski Naït-Abdallah (University of Brest) and Alain Lecomte gave a rigorous account of some pragmatic notions formerly studied by the linguist O. Ducrot .

Modeling French sign language (LSF) grammar

Pierre Guitteny studied the diathesis in LSF and proved the existence of passive or inverse constructions in LSF on the basis of a corpus study .

Pursuing the work of Olivier De Langhe, Pierre Guitteny, Henri Portine and Christian Retoré, Emilie Voisin further experimented with sign order in LSF. She observed that verb flexion, if any, can be influenced by the subject, the object as well as personal transfer. Her analysis showed that under some circumstances, in particular when verbal flexion is influenced by the object, the sign order is SOV rather than OSV .

Contracts and Grants with Industry PicoPeta

Gérard Huet ported his Sanskrit processing workbench as an application for the Simputer, a hand-held computing device running Linux developed in India. He visited the PicoPeta corporation in Bangalore, one of the manufacturers of the Simputer, in order to initiate a possible technology transfer towards a pocket Sanskrit machine.

Other Grants and Activities Regional research programs

The region Aquitaine is funding (together with INRIA and LABRI-CNRS) a project on sign language processing and a PhD grant on the same topic. Given an accurate video recorder and corresponding software and computer, our team should be able to constitute a very good quality corpus of spontaneous sign language speech as well as guided experiments. Contact: Christian Retoré

National research programs Groupement de Recherche C.N.R.S. 2521 Sémantique et modélisation

Signesis one of the fifteen research team of the Groupe de Recherches 2521 (C.N.R.S.) directed by Francis Corblin (Université Paris IV). This research program is divided into Opérations: Modèles et formats de représentation pour la sémantique, Les Modèles à l'épreuve des données, Sémantique et corpus, Les interfaces de la sémantique linguistique, Sémantique computationnelle. The Signesteam is part of the later two operations, which could be translated as Interfaces of linguistic semanticsand Computational semantics.

Programme Interdiciplinaire du C.N.R.S. Traitement des Connaissances, Apprentissage et Nouvelles Technologies de l'Information et de la Communication

Alain Lecomte is supervising a project VALI ( Vers des assistants lecteurs intelligents) in this setting. It is intended to develop tools to help the new researcher to grasp the contents of a research article. To do so, the contents can be organized using linguistic theories like SDRT and logical tools like the proof assistant Coq can be applied to deduce relationship between parts of contents.

European research programs CoLogNet: European network of Excellence on Computational logic

The team Signesis an active node of this network and, in particular of the section 6 of this network: computational logic for natural language processing, headed by Michael Moortgat. The contact person is Gérard Huet.

UIL-OTS Utrecht — Signes(Action intégrée van Gogh)

A research program entitled Generative grammar and deductive systems for the processing of natural language syntax and semanticshas been approved for 2004 and renewed for 2005. The other team in this bilateral research program is Computational linguistics and logicdirected by Michael Moortgat at Utrecht Institue of Linguistics. The dutch contact is Willemijn Vermaat, and the french one is Christian Retoré.

Pompeu Fabra – ERSS/Signes – Paris 7 (PICS France Catalonia, CNRS)

Enric Vallduvi, Joan Busquets, Pascal Amsili, Etude comparative des connecteurs et des marqueurs discursifs dans le cadre d'une sémantique dynamique du discours.

Dissemination Activism within the scientific community Honours

Gérard Huet is member of the Académie des sciencessince November 2002.

Gérard Huet was invited to become member of the International Advisory Board of NII (National Institute of Informatics) in Tokyo, Japan. He participated to the first meeting of this board on June 2nd, and was subsequently offered to write a tribune in NII's journal .

Editorial boards

Alain Lecomte is on the editorial board of the journal TAL – Traitement Automatique des Langues, Editions Hermès, Paris since august 2001.

Alain Lecomte and Christian Retoré are on the editorial board of the book series Research in Logic and Formal Linguistics, Edizione Bulzoni, Roma, since 1999.

Henri Portine is on the editorial board of the journal ALSIC – Apprentissage des Langues et Systèmes d'Information et de Communication

Christian Retoré is reviewer for Mathematical Reviewssince october 2003.

Christian Retoré is editor in chief of the journal TAL – Traitement Automatique des Langues, Editions Hermès, Paris since April 2004. (in the editorial board since 2001).

Program committees of conferences

Maxime Amblard and Renaud Marlet chaired the LACL Student Session committee, 2005.

Christian Bassac was on the program committee of International Morphology Conference, Toulouse, December 2005.

Christian Bassac was on the program committee of the 3rd international workshop on Generative Approaches to the Lexicon , Geneve, Mars 2005.

Christian Bassac was on the program committee of the student session of Logical Aspects of Computational Linguistics 2005 (Bordeaux)

Joan Busquets was on the program committee of the Symposium sur l'étude du Sens : Exploration et Modélisation 2005 (Biarritz)

Joan Busquets, Richard Moot and Christian Retoré were on the committee of Logical Aspects of Computational Linguistics 2005 (Bordeaux)

Richard Moot was on the reading committee of TALN 2006.

Christian Retoré was on the program committee of ESSLLI 2005 (Edinburgh).

Christian Retoré is on the reading committee of Human Language Technology / Empirical Methods in NLP 2005 (Vancouver)

Christian Retoré is on the program committee of Traitement Automatique du Langage Naturel 2006 (Leuven)

Christian Retoré is on the reading committee of Human Language Technology / North American Chapter of the ACL 2006 (New-York)

Academic committees

Christian Bassac is a member of the hiring committee in linguistics of Université Bordeaux 3.

Joan Busquets is a member of the hiring committees in linguistics of Université Toulouse 2 and Université Bordeaux 3.

Gérard Huet is a nominated scientific personnality of the board of governors of the Université Paris 7.

Renaud Marlet was a member of the hiring committee for junior research scientist at INRIA Futurs.

Henri Portine is a member of the hiring committees in linguistics of Université Paris 3 and Université Bordeaux 3.

Henri Portine is an elected member of the board of governors of the Université Bordeaux 3 and of Institut Universitaire de Formation des Maîtres d'Aquitaine.

Henri Portine is the head of the linguistic and literature faculty of Université Bordeaux 3.

Henri Portine is the head of the research team Text, Language, CognitionJE2385.

Christian Retoré is a member of the hiring committee in computer-science of Université Bordeaux 1.

Christian Retoré is a member of the committee of the faculty of mathematics and computer science of the Université Bordeaux 1.

Organization of events

Christian Bassac organised the Journées de Linguistique Anglaise, 26-27 October 2005.

Joan Busquets, Richard Moot, Christian Retoré organized the 5th international conference on Logical Aspects of Computational Linguistics, 28-30 April 2005.

Kim Gerdes, Maxime Amblard organized the weekly seminar Linguistique et informatiqueUniversités Bordeaux 1 et 3.

Teaching

Since all its members are university staff, Signesis intensively implied in teaching, both in the computer science cursus (University Bordeaux 1) and in the linguistic cursus (University of Bordeaux 3). Let us cite the lectures whose topic is computational linguistics:

Natural language processing, Bordeaux 1, PhD students in computer science (Christian Retoré)

Structures Informatiques et Logiques pour la Modélisation Linguistique, Parisian Master of Research in Informatics (MPRI). (Gérard Huet, Philippe de Groote)

Symbolic natural language processing, Bordeaux 1, 5 ^{t h}year in computer science (Christian Retoré)

Utterance acts and semantics, Bordeaux 3, 5 ^{t h}year in linguistics (Henri Portine)

The syntax of Wh-clauses and extraction, Bordeaux 3, 5 ^{t h}year in linguistics (Christian Bassac)

Finite state natural language processing, Bordeaux 1, 4 ^{t h}year in computer science (Christian Retoré)

The principle of charity: Quine and Davidson, Bordeaux 3, 4 ^{t h}year in linguistics (Joan Busquets)

Pragmatics, Bordeaux 3, 4 ^{t h}year in linguistics (Joan Busquets)

Word order and its formalization, Bordeaux 3, 4 ^{t h}year in linguistics (Kim Gerdes)

Linguistic formalisms, Bordeaux 3, 4 ^{t h}year in linguistics (Lionel Clément, Kim Gerdes, Renaud Marlet)

Thesis Juries

Christian Retoré is reviewing the habilitation of Isabelle Tellier ( Modéliser l'acquisition de la syntaxe du langage naturel via l'hypothèse de la primauté du sensUniversité de Lille 1, 8- 12-05).

Academic supervision Student intern supervision – fifth year

Gérard Huet supervised the master thesis of Benoît Razet: Automates modulaires, Université Paris 7, 2005.

Christian Retoré and Kim Gerdes supervised the master thesis of Nicolas Letteron: Construction automatique d'un dictionnaire à partir de corpus, Université Bordeaux 1.

Chrisitan Bassac, Renaud Marlet and Christian Retoré supervised the master thesis of Jules Vanier: Vers un modèle de représentation des connaissances dédié à l'analyse sémantique de la phrase, University of Paris 7 (and Ecole Centrale de Paris).

PhD supervision

Alain Lecomte is supervising the thesis work of Tran Vu Truc Logique d'informations partielles pour le traitement des implicites. (Université Grenoble II)

Alain Lecomte and Christian Retoré are co-supervising the thesis work of Maxime Amblard, Calcul de représentations sémantiques dans les grammaires minimalistes. (Université Bordeaux 1)

Henri Portine and Renaud Marlet are supervising the thesis work of Emilie Voisin, Génération automatique d'énoncés en Langue des Signes Française. (Université Bordeaux 3)

Henri Portine is supervising the thesis work of Pierre Guitteny, Le passif en Langue des Signes Française. (Université Bordeaux 3)

Christian Retoré and Alexandre Dikovsky (Université de Nantes) are co-supervising the thesis work of Erwan Moreau, Acquisition de grammaires catégorielles et de grammaires de dépendances. (Université de Nantes)

Christian Retoré and Denis Delfitto (Università di Verona) are co-supervising the thesis work of Roberto Bonato, Algorithmes de calcul de représentations sémantiques à partir d'analyses de type générativiste et algorithmes inverses. (cotutored PhD Université Bordeaux 1 / Università di Verona)

Participation to colloquia, seminars, invitations Visiting scientists

Lionel Clément (INRIA-Rocquencourt) visited Signes in January 2005. (seminar)

Jan van Eijck (Amsterdam) visited Signes in March 2005 (van Gogh PAI) Earley algorithm for parsing indexed grammars. Definable generalised quantifiers.

Willemijn Vermaat, Matteo Capeletti (OTS, Utrecht) visited Signes in May 2005 ( van Gogh PAI)

Cristiano Chiesi (Sienna & MIT) visited Signes in May 2005 (incremental parsing of minimalist grammars)

Jean-Marie Pierrel (ATILF, Nancy) visited Signes in May 2005 (seminar)

Marie-Laure Guénot (LPL, Aix) visited Signes in June 2005 (seminar)

Laurence Danlos (TALANA, Paris) visited Signes in june 2005 (seminar)

Jens Michaelis (Tuebingen/Potsdam) and Hans-Martin Gaertner (Berlin) visited Signes in september 2005 for a week (working group)

Emilie Guimer de Neef (France Telecom), Emilie Chetelat and Loic Kervajan (France Telecom & DELIC) visited Signes in November 2005.

Michael Moortgat, Matteo Capeletti (OTS, Utrecht) visited Signes in December 2005 ( van Gogh PAI)

Seminar Talks, Invitations

In January, G. Huet participated to the annual TECS Excellence week in Pune, India, as member of the International Advisory Board of TRDDC (Tata Consultancy Services).

On April 13th, G. Huet was invited to give a talk at the International Conference on Rewriting Theory and Applications (RTA'05) in Nara, Japan. He talked on ``Rewriting before RTA''.

On April 26th, G. Huet was invited to deliver the Robin Milner lecture at University of Edinburgh. He talked on ``Design of a Computational Linguistics Platform''.

Participation to conferences and summer school

Christian Bassac and Richard Moot attended ESSLLI 2005 in Edinburgh.

Maxime Amblard, Pierre Guitteny, Christian Retoré, Emilie Voisin attended the conference TALN06, Dourdan, June 2005.

Discourse Parallelism, Ellipsis, and Ambiguity Nicolas Asher N. Daniel Hardt D. Joan Busquets J. Journal of Semantics 18 1 2001 1–25 Principes de morphologie anglaise Linguistica Christian Bassac C. Presses Universitaires de Bordeaux 2004 A propos des structures OSV en Langue des Signes Française Olivier de Langhe O. Pierre Guitteny P. Henri Portine H. Christian Retoré C. Silexicales 4 2004 115–130 Topologie et grammaires formelles de l'allemand Kim Gerdes K. Thèse de Doctorat Université Paris 7 2002 Transducers as lexicon morphisms, phonemic segmentation by euphony analysis, application to a sanskrit tagger Gérard Huet G. Journal of Functional Programming 2005 http://pauillac.inria.fr/~huet/PUBLIC/tagger.ps Categorial Grammar for Minimalism Alain Lecomte A. C. Casadio C. P. Scott P. R. Seely R. Logic and Grammar CSLI 2005 Proof nets for linguistic analysis Richard Moot R. Ph. D. Thesis UIL-OTS, Universiteit Utrecht 2002 La syntaxe de Damourette et Pichon comme outil de représentation du sens Henri Portine H. Modèles linguistiques 23 2 2002 21–46 Logique linéaire et syntaxe des langues Christian Retoré C. Mémoire d'habilitation à diriger des recherches Université de Nantes Janvier 2002 Generative Grammar in Resource Logics Christian Retoré C. Edward Stabler E. Journal of Research on Language and Computation 2 1 2004 3–25 Student Session of the 5th International Conference on Logical Aspects of Computational Linguistics, LACL 2005, Bordeaux, France, April 29th, 2005 Research Reports Maxime Amblard M. Renaud Marlet R. RR-1356-05 LaBRI-CNRS 2005 Logical Aspects of Computational Linguistics, 5th International Conference, LACL 2005, Bordeaux, France, April 28-30, 2005, Proceedings Lecture Notes in Computer Science Philippe Blache P. Edward P. Stabler E. P. Joan Busquets J. Richard Moot R. 3492 Springer 2005 Logique et langage : apports de la philosophie médiévale Joan Busquets J. Presses Universitaires de Bordeaux 2006 A compositional treatment for English compounds Christian Bassac C. 1570-7075 Research in Language october 2005 Morphologie de la prédication verbale et non verbale en turc Christian Bassac C. Mehmet Ciçek M. Jean Marie Merle J. M. La Pré dication To appear Ophrys 2005 A propos de fer-ho(le faire) anaphorique en catalan Joan Busquets J. F. Lambert F. H. Nølke H. La syntaxe au coeur de la grammaire Rivages Linguistiques Presses Universitaires de Rennes 2005 45-54 Négation, présupposition, et ellipse en Catalan' Joan Busquets J. 0242-1593 Cahiers de Grammaire 30 2006 Stripping and Ellipsis in Catalan: What is Deleted and When?' J. Busquets J. 0921-4771 Probus To appear 2006 Les mathématiques dans le monde contemporain, Ed. Jean-Christophe Yoccoz Philippe Flajolet P. Gérard Huet G. Mathématiques et informatique Rapport sur la science et la technologie no 20, Académie des sciences 2005 Sur la non-équivalence des représentations syntaxiques ou comment la représentation en X-barre nous amène au concept du mouvement Kim Gerdes K. 0242-1593 Cahiers de grammaire 30 2005 A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger Gérard Huet G. 0956-7968 Journal of Functional Programming 15 4 2005 573–614 http://pauillac.inria.fr/~huet/PUBLIC/tagger.ps Internet Challenges for Informatics Research Gérard Huet G. 1349-8614 Progress in Informatics 1,2 2005 Categorial Grammar for Minimalism Alain Lecomte A. C. Casadio C. P. Scott P. R. Seely R. Logic and Grammar CSLI 2005 Extraction of Type-Logical Supertags from the Spoken Dutch Corpus Richard Moot R. Srinivas Bangalore S. Aravind Joshi A. Complexity of Lexical Descriptions and its Relevance to Natural Language Processing: A Supertagging Approach to appear MIT Press 2005 Les indices pronominaux du français dans les grammaires catégorielles Richard Moot R. Christian Retoré C. 0378-4169 Linguisticae Investigationes to appear 2005 Le canard, de la désignation d'un oiseau à celle d'un journal ou l'homonymie peut-elle être cognitive ? Henri Portine H. C. Casseville C. P. Baudorre P. L'amitié , ce pur fleuve... Hommage à Bernard Cocula L'Esprit du Temps 2005 95–116 Vers une analyse syntaxique des "relatives quotidiennes" en français Henri Portine H. F. Lambert F. H. Nølke H. La syntaxe au coeur de la grammaire Rivages Linguistiques Presses Universitaires de Rennes 2005 259–270 Syntaxe et Traitement Automatique des Langues Christian Retoré C. F. Lambert F. H. Nlke H. La syntaxe au coeur de la grammaire Rivages Linguistiques Presses Universitaires de Rennes 2005 271–286 On Expressing Vague Quantification and Scalar Implicatures in the Logic of Partial Information Areski Naït Abdallah A. N. Alain Lecomte A. Philippe Blache P. Edward P. Stabler E. P. Joan Busquets J. Richard Moot R. Logical Aspects of Computational Linguistics, 5th International Conference, LACL 2005, Bordeaux, France, April 28-30, 2005, Proceedings Lecture Notes in Computer Science 3492 Springer 2005 205–220 Counting dependencies and Minimalist Grammars Amblard 2005 Synchronisation syntaxe sémantique, des grammaires minimalistes catégorielles aux Constraint Languages for Lambda Structures Amblard 2005 Une bibliothèque pour le traitement des langues naturelles Houda Anoun H. Journées francophones des langages applicatifs 2005 Reasoning on Multimodal logic with the Calculus of Inductive Constructions Houda Anoun H. 12th Logic for Programming Artificial Intelligence and Reasoning, LPAR`06 Springer Verlag 2006 Qualia structure and anaphoric reference in compounds Christian Bassac C. Pierrette Bouillon P. E.T.I Proceedings of the Third international Worksop on Generative Approaches to the Lexicon 2005 27-35 Towards a Computational Treatment of Binding Theory. Roberto Bonato R. Philippe Blache P. Edward P. Stabler E. P. Joan Busquets J. Richard Moot R. Logical Aspects of Computational Linguistics, 5th International Conference, LACL 2005, Bordeaux, France, April 28-30, 2005, Proceedings Lecture Notes in Computer Science 3492 Springer 2005 35-50 German Partial VP Fronting in a Meaning-Text Approach Kim Gerdes K. Societas Linguistica Europaea 38 2005 On the Descriptive Adequacy of Topology Kim Gerdes K. Sylvain Kahane S. Hi-Yon Yoo H.-Y. 2nd International Conference on Meaning – MTT 2005 Text Theory (MTT-2005) 2005 Topological Word Order of Modern Greek Kim Gerdes K. Hi-Yon Yoo H.-Y. 7th International Conference on Greek Linguistics, ICGL, York 2005 Passif et Inverse en Langue des Signes française Pierre Guitteny P. ATALA Atelier Traitement Automatique des Langues des Signes – TALN 2005, Dourdan 2005 321–325 Design of a Computational Linguistics Platform for Sanskrit Gérard Huet G. Logical Aspects of Computational Linguistics, LACL 05 Invited Lecture 2005 Un système de traitement informatique du sanskrit Gérard Huet G. Journé e ATALA: Traitement Automatique des Langues Anciennes Invited Lecture 2005 Flexion et ordre des signes en Langue des Signes Française Émilie Voisin É. Atelier des doctorants en Linguistique Université Paris 7 2005 Stripping vs VP-Ellipsis in Catalan Joan Busquets J. Technical report RR-5616 INRIA 2005 http://www.inria.fr/rrrt/rr-5616.html The Zen Computational Linguistics Toolkit, Version 2.0 Gérard Huet G. Technical report INRIA 2005 http://sanskrit.inria.fr/ZEN/ The Reactive Engine for Modular Transducers Gérard Huet G. Benoît Razet B. Technical report 2005 Construction automatique de dictionnaire à partir de corpus Nicolas Letteron N. Master Thesis Université Bordeaux 1 2005 Grail Richard Moot R. Technical report 2005 http://www.labri.fr/perso/moot/grail3.html Automates modulaires Benoît Razet B. Master Thesis Université Paris 7 2005 The Logic of Categorial Grammars – Lecture Notes Christian Retoré C. 108 pp Research Report 5703 INRIA 2005 http://www.inria.fr/rrrt/rr-5703.html Vers un modèle de représentation des connaissances dédié à l'analyse sémantique de la phrase Jules Vanier J. Master Thesis Ecole Centrale de Paris - Université Paris 6 - INRIA 2005 Finite-State Morphology: Xerox Tools and Techniques Kenneth R. Beesley K. R. Lauri Karttunen L. Cambridge University Press 2002 Constraints and Resources in Natural Language Syntax and Semantics Gosse Bouma G. Erhard Hinrichs E. Geert-Jan M. Kruijff G.-J. M. Richard Oehrle R. distributed by Cambridge University Press CSLI 1999 The minimalist program Noam Chomsky N. MIT Press

Cambridge, MA

1995 Semantic readings of proof nets Philippe de Groote P. Christian Retoré C. Geert-Jan Kruijff G.-J. Glyn Morrill G. Dick Oehrle D. Formal Grammar, Prague FoLLI 1996 57–70 Dependencies on the other side of the Curtain Alexander Dikovsky A. Larissa Modina L. Traitement Automatique des Langues 41 1 2000 67-95 On the Creation and Use of English Compound Nouns Pamela Downing P. Language 53 4 1977 810–842 Logic, Language and Meaning – Volume 2: Intensional logic and logical grammar L. T. F. Gamut L. T. F. The University of Chicago Press 1991 Linear Contexts and the Sharing Functor: Techniques for Symbolic Computation. Gérard Huet G. Fairouz Kamareddine F. Thirty Five Years of Automating Mathematics Kluwer 2003 http://pauillac.inria.fr/~huet/PUBLIC/DB.pdf Zen and the Art of Symbolic Computing: Light and Fast Applicative Algorithms for Computational Linguistics Gérard Huet G. Practical Aspects of Declarative Languages (PADL) symposium, New Orleans Invited lecture 2003 http://pauillac.inria.fr/~huet/PUBLIC/padl.pdf Transducers as lexicon morphisms, phonemic segmentation by euphony analysis, application to a sanskrit tagger Gérard Huet G. Journal of Functional Programming 2005 http://pauillac.inria.fr/~huet/PUBLIC/tagger.ps The Architecture of the Language Faculty Linguistic Inquiry Monographs Ray Jackendoff R. 28 M.I.T. Press

Cambridge, Massachusetts

1995 Tree Adjunct Grammar Aravind Joshi A. Leon Levy L. Masako Takahashi M. Journal of Computer and System Sciences 10 1975 136–163 Tree Adjoining Grammars Aravind Joshi A. Yves Schabes Y. G. Rozenberg G. A. Salomaa A. Handbook of Formal Languages, Berlin 3 2 Springer Verlag 1996 The convergence of mildly context-sensitive grammar formalisms Aravind Joshi A. K. Vijay-Shanker K. David Weir D. P. Sells P. S. Schieber S. T. Wasow T. Fundational issues in natural language processing MIT Press 1991 From Discourse to Logic H. Kamp H. U. Reyle U. D. Reidel

Dordrecht

1993 Regular Models of Phonological Rule Systems Ronald M. Kaplan R. M. Martin Kay M. Computational Linguistics 20,3 1994 331–378 Applications of Finite-State Transducers in Natural Language Processing Lauri Karttunen L. Proceedings, CIAA-2000 2000 A general computational model for word-form recognition and production K. Koskenniemi K. 10th International Conference on Computational Linguistics 1984 Extending Lambek grammars: a logical account of minimalist grammars Alain Lecomte A. Christian Retoré C. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, ACL 2001, Toulouse ACL July 2001 354–361 http://www.labri.fr/Recherche/LLA/signes/ Towards a Minimal Logic for Minimalist Grammars: a Transformational Use of Lambek Calculus Alain Lecomte A. Christian Retoré C. Formal Grammar, FG`99 FoLLI 1999 83–92 Pointing out differences: ASL pronouns in syntactic theory Diane Lillo-Martin D. Edward S. Klima E. S. Susan D. Fisher S. D. Patrica Siple P. Theoretical issues in sign language research – Vol 1 Linguistics University of Chicago Press 1990 191–210 Universal Grammar and American Sign Language: Setting the Null Argument Parameters Diane Lillo-Martin D. Kluwer 1991 Foundations of statistical natural language processing Christopher Manning C. Hinrich Schutze H. MIT Press 1999 Derivational minimalism is mildly context sensitive Jens Michaelis J. Michael Moortgat M. Logical Aspects of Computational Linguistics, LACL`98, selected papers LNCS/LNAI 2014 Springer-Verlag 2001 179–198 Logical Aspects of Computational Linguistics, LACL`98, selected papers LNCS/LNAI Michael Moortgat M. 2014 Springer-Verlag 2001 Categorial Type Logic Michael Moortgat M. Johan van Benthem J. Alice ter Meulen A. Handbook of Logic and Language, Amsterdam 2 North-Holland Elsevier 1996 93–177 Proof nets for linguistic analysis Richard Moot R. Ph. D. Thesis UIL-OTS, Universiteit Utrecht 2002 The Syntax of American Sign Language – Functional Categories and Hierarchical Structure Carol Neidle C. Judy Kegl J. Dawn MacLaughlin D. Benjamin Bahan B. Robert G. Lee R. G. MIT Press 2000 Traitement Automatique des Langues: analyse syntaxique dans les grammaires catégorielles Thomas Poussevin T. Jean-François Deverge J.-F. Fahd Haiti F. Anthony Herbé A. Mémoire de Maîtrise – TER Université Bordeaux 1 May 2003 The Generative Lexicon James Pustejovsky J. MIT Press 1995 Logical Aspects of Computational Linguistics, LACL`96 LNCS/LNAI Christian Retoré C. 1328 Springer-Verlag 1997 Generative Grammar in Resource Logics Christian Retoré C. Edward Stabler E. Journal of Research on Language and Computation 2(1) 2004 3–25 Special Issue on Resource Logics and Minimalist Grammars C. Retoré C. E. Stabler E. 2(1) Kluwer 2004 Handbook of Formal Languages G. Rozenberg G. A. Salomaa A. Springer Verlag

Berlin

1997 Derivational Minimalism Edward Stabler E. Christian Retoré C. Logical Aspects of Computational Linguistics, LACL`96 LNCS/LNAI 1328 Springer-Verlag 1997 68–95 Remnant movement and structural complexity Edward Stabler E. Gosse Bouma G. Erhard Hinrichs E. Geert-Jan M. Kruijff G.-J. M. Richard Oehrle R. Constraints and Resources in Natural Language Syntax and Semantics distributed by Cambridge University Press CSLI 1999 299–326 Lessons in SignWriting Valerie Sutton V. 2002 http://www.signwriting.org The collected papers of Richard Montague Richmond Thomason R. Yale University Press 1974 Handbook of Logic and Language Johan van Benthem J. Alice ter Meulen A. North-Holland Elsevier

Amsterdam

1997 Representing Discourse in Context Jan van Eijck J. Hans Kamp H. Johan van Benthem J. Alice ter Meulen A. Handbook of Logic and Language, Amsterdam 3 North-Holland Elsevier 1996 179–237 Communicative Organization in Natural Language: The Semantic-communicative Structure of Sentences Igor Melcuk I. John Benjamins 2001 Dependency syntax – theory and practice Linguistics Igor Melcuk I. State University of New York Press 1988