Section: Overall Objectives
Scientific context
Computational linguistics is a discipline at the intersection of computer science and linguistics. On the theoretical side, it aims to provide computational models of the human language faculty. On the applied side, it is concerned with natural language processing and its practical applications.
From a structural point of view, linguistics is traditionally organized into the following sub-fields:
-
Syntax, the study of language structure, i.e., the way words combine into grammatical phrases and sentences.
-
Semantics, the study of meaning at the levels of words, phrases, and sentences.
-
Pragmatics, the study of the ways in which the meaning of an utterance is affected by its context.
Computational linguistics is concerned by all these fields. Consequently, various computational models, whose application domains range from phonology to pragmatics, have been developed. Among these, logic-based models play an important part, especially at the “higher” levels.
At the level of syntax, generative grammars [34] may be seen as basic inference systems, while categorial grammars [56] are based on substructural logics specified by Gentzen sequent calculi. Finally, model-theoretic grammars [75] amount to sets of logical constraints to be satisfied.
At the level of semantics, the most common approaches derive from Montague grammars, [60] , [61] , [62] which are based on the simply typed -calculus and Church's simple theory of types [35] . In addition, various logics (modal, hybrid, intensional, higher-order...) are used to express logical semantic representations.
At the level of pragmatics, the situation is less clear. The word pragmatics has been introduced by Morris [66] to designate the branch of philosophy of language that studies, besides linguistic signs, their relation to their users and the possible contexts of use. The definition of pragmatics was not quite precise, and for a long time several authors have considered (and some authors are still considering) pragmatics as the wastebasket of syntax and semantics [28] . Nevertheless, as far as discourse processing is concerned (which includes pragmatic problems such as pronominal anaphora resolution), logic-based approaches have also been successful. In particular, Kamp's Discourse Representation Theory [52] gave rise to sophisticated `dynamic' logics [46] . The situation, however, is less satisfactory than it is at the semantic level. On the one hand, we are facing a kind of logical “tower of Babel”. The various pragmatic logic-based models that have been developed, while sharing underlying mathematical concepts, differ in several respects and are too often based on ad hoc features. As a consequence, they are difficult to compare and appear more as competitors than as collaborative theories that could be integrated. On the other hand, several phenomena related to discourse dynamics (e.g., context updating, presupposition projection and accommodation, contextual reference resolution...) are still lacking deep logical explanations. We strongly believe, however, that this situation can be improved by applying to pragmatics the same approach Montague applied to semantics, using the standard tools of mathematical logic.
Accordingly:
The overall objective of the Sémagramme project is to design and develop new unifying logic-based models, methods, and tools for the semantic analysis of natural language utterances and discourses. This includes the logical modeling of pragmatic phenomena related to discourse dynamics. Typically, these models and methods will be based on standard logical concepts (stemming from formal language theory, mathematical logic, and type theory), which should make them easy to integrate.
The project is organized along three research directions (i.e., Syntax-semantics interface, Discourse dynamics, and Common basic resources), which interact as explained below.