Projectteam Calligramme's aim is the development of tools and methods that stem from proof theory, and in particular, linear logic. Two fields of application are emphasized: in the area of computational linguistics, the modelling of the syntax and semantics of natural languages; in the area of software engineering the study of the termination and complexity of programs.
Projectteam Calligramme's research is conducted at the juncture of mathematical logic and computer science. The scientific domains that base our investigations are proof theory and the calculus, more specifically linear logic. This latter theory, the brainchild of J.Y. Girard results from a finer analysis of the part played by structural rules in Gentzen's sequent calculus . These rules, traditionally considered as secondary, specify that the sequences of formulas that appear in sequents can be treated as (multi) sets. In the case of intuitionistic logic, there are three of them:
These rules have important logical weight: the weakening rule embodies the fact that some hypotheses may be dropped during a derivation; in a similar fashion the contraction rule specifies that any hypothesis can be used an unlimited number of times; as for the exchange rule it stipulates that no order of priority holds between hypotheses. Thus, the presence of the structural rules in the ordinary sequent calculus strongly conditions the properties of the logic that results. For example, in the Gentzenstyle formulations of classical or intuitionistic logic, the contraction rule by itself entails the undecidability of the predicate calculus. In the same manner, the use of the weakening and contraction rules in the right half of the sequent in classical logic is responsible for the latter's nonconstructive aspects.
According to this analysis, linear logic can be understood as a system that conciliates the constructivist aspect of intuitionistic logic and the symmetry of classical logic. As in intuitionistic logic the constructive character comes from the banning of the weakening and contraction rules in the right part of the sequent. But simultaneously, in order to preserve symmetry in the system, the same rules are also rejected in the other half.
Propositional linear logic  
Rudimentary linear logic  
Negation  Multiplicatives  Additives  Exponentials  
Negation  
Conjunction 
A
B

A&
B


Disjunction 
A
B


Implication  
Constants  1,  , 0  
Modalities  !A, ?A 
The resulting system, called rudimentary linear logic, presents many interesting properties. It is endowed with four logical connectors (two conjunctions and two disjunctions) and the four constants that are their corresponding units. It is completely symmetrical, although constructive, and equipped with an involutive negation. As a consequence, rules similar to De Morgan's law hold in it.
In rudimentary linear logic, any hypothesis must be used once and only once during a derivation. This property, that allows linar logic to be considered as a resource calculus, is due, as we have seen, to the rejection of structural rules. But their total absence also implies that rudimentary linear logic is a much weaker system than intuitionistic or classical logic. Therefore, in order to restore its strength it is necessary to augment the system with operators that recover the logical power of the weakening and contraction rules. This is done via two modalities that give tightly controlled access to the structural rules. Thus, linear logic does not question the usefulness of the structural rules, but instead, emphasizes their logical importance. In fact, it rejects them as epitheoretical rules to incorporate them as logical rules that are embodied in new connectors. This original idea is what gives linear logic all its subtlety and power.
The finer decomposition that linear logic brings to traditional logic has another consequence: the Exchange rule, which so far has been left as is, is now in a quite different position, being the only one of the traditional structural rules that is left. A natural extension of Girard's original program is to investigate its meaning, in other words, to see what happens to the rest of the logic when Exchange is tampered with. Two standard algebraic laws are contained in it: commutativity and associativity. Relaxing these rules entails looking for noncommutative, and nonassociative, variants of linear logic; there are now several examples of these. The natural outcome of this proliferation is a questioning of the nature of the structure that binds formulas together in a sequent: what is the natural general replacement of the notion of (multi) set, as applied to logic? Such questions are important for Calligramme and are addressed, for example, in .
The activities of projectteam Calligramme are organized around three research actions:
Proof nets, sequent calculus and typed calculi;
Grammatical formalisms;
Implicit complexity of computations.
The first one of these is essentially theoretical, the other two, presenting both a theoretical and an applied character, are our privileged fields of application.
The aim of this action is the development of the theoretical tools that we use in our other research actions. We are interested, in particular, in the notion of formal proof itself, as much from a syntactical point of view (sequential derivations, proof nets, terms), as from a semantical point of view.
Proof nets are graphical representations (in the sense of graph theory) of proofs in linear logic. Their role is very similar to lambda terms for more traditional logics; as a matter of fact there are several backandforth translations that relate several classes of lambda terms with
classes of proof nets. In addition to their strong geometric character, another difference between proof nets and lambda terms is that the proof net structure of a proof of formula
Tcan be considered as structure which is
addedto
T, as a coupling between the atomic formula nodes of the usual syntactic tree graph of
T. Since not all couplings correspond to proofs of
Tthere is a need to distinguish the ones that do actually correspond to proofs; this is called a
correctness criterion.
The discovery of new correctness criteria remains an important research problem, as much for Girard's original linear logic as for the field of noncommutative logics. Some criteria are better adapted to some applications than others. In particular, in the case of automatic proof search, correctness criteria can be used as invariants during the inductive process of proof construction.
The theory of proof nets also presents a dynamic character: cut elimination. This embodies a notion of normalization (or evaluation) akin to reduction in the calculus.
As we said above, until the invention of proof nets, the principal tool for representing proofs in constructive logics was the calculus. This is due to the CurryHoward isomorphism, which establishes a correspondence between natural deduction systems for intuitionistic logics and typed calculi.
Although the CurryHoward isomorphism owes its existence to the functional character of intuitionistic logic, it can be extended to fragments of classical logic. It turns out that some constructions that one meets in functional progamming languages, such as control operators, can presently only be explained by the use of deduction rules that are related to proof by contradiction
This extension of the CurryHoward isomorphism to classical logic and its applications has a perennial place as research field in the project.
Lambek's syntactic calculus, which plays a central part in the theory of categorial grammars, can be seen a posteriorias a fragment of linear logic. As a matter of fact it introduces a mathematical framework that enables extensions of Lambek's original calculus as well as extensions of categorial grammars in general. The aim of this work is the development of a model, in the sense of computational linguistics, which is more flexible and efficient than the presently existing categorial models.
The relevance of linear logic for natural language processing is due to the notion of resource sensivity. A language (natural or formal) can indeed be interpreted as a system of resources. For example a sentence like
The man that Mary saw Peter sleptis incorrect because it violates an underlying principle of natural languages, according to which verbal valencies must be realized once and only once. Categorial grammars formalize this idea by specifying that a verb such as saw is a resource which
will give a sentence
Sin the presence of a nominal subject phrase,
NP, and only one direct object
NP. This gives rise to the following type assigment:
Mary, Peter: 
N
P


saw 
(
N
P\
S)/
N
P

where the slash ( /) (resp. the backslash ( \)) is interpreted as fraction pairings that simplify to the right (resp. to the left). However we notice very soon that this simplification scheme, which is the basis of BarHillel grammars , is not sufficient.
Lambek solves this problem by suggesting the interpretation of slashes and backslashes as implicative connectors , . Then not only do they obey the modus ponenslaw which turns out to be BarHillel's simplification scheme
but also the introduction rules:
The Lambek calculus does have its own limitations. Among other things it cannot treat syntactical phenomena like medial extraction and crossed dependencies. Thus the question arises: how can we extend the Lambek calculus to treat these and related problems? This is where linear logic comes into play, by offering an adequate mathematical framework for attacking this question. In particular proof nets appear as the best adapted approach to syntactical structure in the categorial framework.
Proof nets offer a geometrical interpretation of proof construction. Premises are represented by proof net fragments with inputs and outputs which respectively model needed and offered resources. These fragments must then be combined by pairing inputs and outputs according to their types. This process can also be interpreted in a modeltheoretical fashion where fragments are regarded as descriptions for certain class of models: the intuitionistic multiplicative fragment of linear logic can be interpreted on directed acyclic graphs, while for the implicative fragment, trees suffice .
This perspective shift from proof theory to model theory remains founded on the notion of resource sensitivity (e.g. in the form of polarities and their neutralization) but affords us the freedom to interpret these ideas in richer classes of models and leads to the formalism of Interaction Grammars. For example:
where previously we only considered simple categories with polarities, we can now consider complex categories with polarized features.
We can also adopt more expressive tree description languages that allow us to speak about dominance and precedence relations between nodes. In this fashion we espouse and generalize the monotonic version of Tree Adjoining Grammars (TAG) as proposed by VijayShanker .
Contrary to TAG where tree fragments can only be inserted, Interaction Grammars admit models where the interpretations of description fragments may overlap.
The construction of software which is certified with respect to its specifications is more than ever a great necessity. It is crucial to ensure, while developing a certified program, the quality of the implementation in terms of efficiency and computational resources. Implicit complexity is an approach to the analysis of the resources that are used by a program. Its tools come essentially from proof theory. The aim is to compile a program while certifying its complexity.
The metatheory of programming traditionally answers questions with respect to a specification, like termination. These properties all happen to be extensional, that is, described purely in terms of the relation between the input of the program and its output. However, other properties, like the efficiency of a program and the resources that are used to effect a computation, are excluded from this methodology. The reason for this is inherent to the nature of the questions that are posed. In the first case we are treating extensional properties, while in the second case we are inquiring about the manner in which a computation is effected. Thus, we are interested in intensionalproperties of programs.
The complexity of a program is a measure of the resources that are necessary for its execution. The resources taken into account are usually time and space. The theory of complexity studies the problems and the functions that are computable given a certain amount of resources. One should not identify the complexity of functions with the complexity of programs, since a function can be implemented by several programs. Some are efficient, others are not.
One achievement of complexity theory is the ability to tell the ``programming expert'' the limits of his art, whatever the amount of gigabytes and megaflops that are available to him. Another achievement is the development of a mathematical model of algorithmic complexity. But when facing
these models the programming expert is often flabbergasted. There are several reasons for this; let us illustrate the problem with two examples. The linear acceleration theorem states that any program which can be executed in time
T(
n)(where
nis the size of the input) can be transformed into an equivalent problem that can be executed in time
T(
n), where
is ``as small as we want''. It turns that this result has no counterpart in real life. On the other hand a function is feasible if it can be calculated by a program whose complexity is acceptable. The class of feasible functions is often identified with the class Ptime of functions
that are calculable in polynomial time. A typical kind of result is the definition of a progamming language
and the proof that the class of functions represented by that language is exactly the class Ptime. This type of result does not answer the programming expert's needs because the programming language
does not allow the ``right algorithms'', the ones he uses daily. The gulf between the two disciplines is also explained by differences in points of view. The theory of complexity, daughter of the theory of computatibility, has conserved an extensional point of view in its modelling
practices, while the theory of programming is intrinsically intensional.
The need to reason on programs is a relevant issue in the process of software development. The certification of a program is an essential property, but it is not the only one. Showing the termination of a program that has exponential complexity does not make sense with respect to our reality. Thus arises the need to construct tools for reasoning on algorithms. The theory of implicit complexity of computations takes a vast project to task, namely the analysis of the complexity of algorithms.
Abstract Categorial Grammars (ACGs) are a new categorial formalism based on Girard's linear logic. This formalism, which sticks to the spirit of current typelogical grammars, offers the following features:
Any ACG generates two languages, an abstract language and an object language. The abstract language may be thought as a set of abstract grammatical structures, and the object language as the set of concrete forms generated from these abstract structures. Consequently, one has a direct control on the parse structures of the grammar.
The langages generated by the ACGs are sets of linear terms. This may be seen as a generalization of both stringlangages and treelangages.
ACGs are based on a small set of mathematical primitives that combine via simple composition rules. Consequently, the ACG framework is rather flexible.
Abstract categorial grammars are not intended as yet another grammatical formalism that would compete with other established formalisms. It should rather be seen as the kernel of a grammatical framework in which other existing grammatical models may be encoded.
Interaction Grammars (IGs) are a linguistic formalism that aims at modelling both the syntax and the semantics of natural languages according to the following principles:
An IG is a monotonic system of constraints, as opposed to a derivational/transformational system, and this system is multidimensional: at the syntactic level, basic objects are tree descriptions and at the semantic level, basic objects are Directed Acyclic Graph descriptions.
The synchronization between the syntactic and the semantic levels is realized in a flexible way by a partial function that maps syntactic nodes to semantic nodes.
Much in the spirit of Categorial Grammars, the resource sensitivity of natural language is builtin in the formalism: syntactic composition is driven by an operation of cancellation between polarized morphosyntactic features and in parallel, semantic composition is driven by a similar operation of cancellation between polarized semantic features.
The formalism of IG stems from a reformulation of proof nets of Intuitionistic Linear Logic (which have very specific properties) in a modeltheoretical framework and it was at first designed for modelling the syntax of natural languages .
The relevance of new linguistic formalisms needs to be proved by experiments on real corpora. Parsing real corpora requires large scale grammars and lexicons. There is a crucial lack of such resources for French and all researchers committed in NLP projects for French based on different formalisms are confronted with the same problem. Now, building large scale grammars and lexicons for French demands a lot of time and human resources and it is crucial to overcome the multiplicity of existing formalisms by developing common and reusable tools and data. This is the sense of two directions of research:
The modular organization of formal grammars in a hierarchy of classes allows the expression of linguistic generalizations and it makes possible their development and their maintenance on a large scale. To be used in NLP applications such modular grammars have to be compiled into operational grammars. By comparison with the area of programming languages, we write source grammars in a language with a high abstraction level and then we compile them automatically to object grammars, directly usable by NLP applications.
Considering the multiplicity of linguistic formalisms, it would be interesting to express the various source grammars that can written in different formalisms, in a common abstract language and to compile them with the same tool associated to this language. XMG is a first experiment in this direction: for the moment, it allows the edition and the compilation of source grammars for TAGs and IGs. Moreover, we can hope that the use of a common language of syntactic description with a high level of abstraction make easier the reusability of some parts of grammars from one formalism to another.
With the same preoccupation of reusability, it is important to develop syntactic and semantic lexicons which contain only purely linguistic information and which are independent of the different existing grammatical formalisms. Now, a mechanism must be foreseen to combine these lexicons with the grammars built in the various formalisms. A convenient way of doing this is to design the entries of such lexicons in the form of feature structures and to associate also feature structures with the elementary constructions of the grammars. Then, their anchoring in the lexicons is realized by unification of the two kinds of feature structures. The construction of a syntactic and a semantic lexicon for French can be envisaged either by acquisition from corpora or be reuse of existing lexical information.
The theory of implicit complexity is quite new and there are still many things to do. So, it is really important to translate current theoretical tools into real applications; this should allow to validate and guide our hypotheses. In order to do so, three directions are being explored.
First order functional programming. A first prototype, called Icarhas been developed and should be integrated into Elan( http://elan.loria.fr).
Extracting programs from proofs. Here, one should build logical theories in which programs extracted via the CurryHoward isomorphism are efficient.
Application to mobile code system. This work starts in collaboration with the INRIACristal and Mimosa projectteams.
Leoparis a parser for natural languages which is based on the formalism of Interaction Grammars (IG) . It uses a parsing principle, called ``electrostatic parsing'' which is based on neutralizing opposite polarities. A positive polarity corresponds to an available linguistic feature and a negative one to an expected feature.
Parsing a sentence with an Interaction Grammar (IG) consists in first selecting a lexical entry for each of its words. A lexical entry is an underspecified syntactic tree, a tree description in other words. Then, all selected tree descriptions are combined by partial superposition guided by the aim of neutralizing polarities: two opposite polarities are neutralized by merging their support nodes. Parsing succeeds if the process ends with a minimal and neutral tree. As IG are based on polarities and underspecified trees, Leoparuses some specific and nontrivial datastructures and algorithms.
The electrostatic principle has been intensively considered in Leopar. The theoretical problem of parsing IGs is NPcomplete; the nondeterminism usually associated to NPcompleteness is present at two levels: when a description for each word is selected from the lexicon, and when a choice of what nodes to merge is made. Polarities have shown their efficiency in pruning the search tree for these two steps:
In the first step (tagging the words of the sentence with tree descriptions), we forget the structure of descriptions, and only keep the bag of their features. In this case, parsing inside the formalism is greatly simplified because composition rules reduce to the neutralization of a
negative featurevalue pair
by a dual positive featurevalue pair
fv. As a consequence, parsing reduces to a counting of positive and negative polarities present in the selected tagging for every pair
(
f,
v): every positive occurrence counts for +1 and every negative occurrence for –1, the sum must be 0.
In the second step (nodemerging phase), polarities are used to cut off parsing branches whose trees contain too many uncancelled polarities.
A first prototype has been developed until 2003 by Guillaume Bonfante, Bruno Guillaume. This implementation has many drawbacks and is not maintained.
Since 2004, a new implementation of Leoparstarted. Guillaume Bonfante, Bruno Guillaume, Guy Perrier and Sylvain Pogodalla work on this new implementation. The current implementation (17,000 lines of Ocaml) provides different running modes:
automatic parsing of a sentence or a set of sentences;
manual parsing (the user chooses the couple of nodes to merge);
visualization of grammars produced by XMG or of set of description trees associated to some French word.
The main improvements with respect to the previous implementation are:
a finer data structure for tree description: there are now two notions of precedence (direct and large) and there is arity constraint on nodes;
a new algorithm for the first step (tagging) which uses deterministic automata and provides a finer control on the way the filters are applied;
a new algorithm for the nodemerging phase: more constraint propagations are used (hence the search space is reduced);
grammars created with XMG are now directly usable in Leopar;
a new graphical interface (using GTK) which is useful for debugging of grammar.
The current implementation is available on the web ( http://www.loria.fr/equipes/calligramme/leopar/) under the CECILL License ( http://www.cecill.info).
The current implementation comes with a middlesize coverage grammar for French (710 tree descriptions in the grammar produced with XMG). It includes also morphological and syntactical lexicons that cover the French examples of the TSNLP (Test Suite for Natural Langage Processing) .
This software is based on 2 important concepts from logic programming, namely the Warren's Abstract Machine and constraints on finite set. It has been developed by Benoît Crabbé, Yannick Parmentier, Denys Duchier and Joseph Le Roux. The first release is available at http://sourcesup.cru.fr/xmg. It is now maintained by PhD students Yannick Parmentier and Joseph Le Roux.
At current stage of implementation, XMG generates Tree adjoining grammars and Interaction grammars but the underlying formalism is generic so it could be extended to others grammars like dependency grammars or lexical functional grammars, depending on users' requests.
XMG has been used in order to design realistic grammarsfor French, that is to say grammars covering common linguistic knowledge and phenomena. Guy Perrier wrote an Interaction Grammar that is available with LEOPAR. Benoît Crabbé wrote a Tree Adjoining Grammar inspired by the well known FTAG evaluated in less than 3 months. Claire Gardent is using XMG to design a tree adjoining grammar with semantics. Joseph Le Roux is also designing an Interaction Grammar of coordination with XMG.
XMG also has users outside the LORIA: Owen Rambow (Colombia University) is implementing a grammar for Arab, designed with XMG and PhD students from Penn University also work with this tool.
In order to get actual lexicons to run Leopar, we needed to develop some lexical resources. The general architecture is the following:
Lexicon resources are described in two different databases: one for morphological informations and the other one for syntactical aspects; the two databases are compiled in a morphosyntactical lexicon that combines the two kinds of information. In this compiled lexicon, feature structures are used to represent morphosyntactical features associated to each flexed form.
From a metagrammar, through XMG (see ), we generate anonymous tree descriptions that can occur in the targeted language (French); each tree description comes with a feature structure (called interface) that describes how this tree should be anchored in the lexicon database.
Finally, we use feature structure unification to combine grammatical and morphosyntactic databases. When unification between the feature structure of a word (given by the morphosyntactic database) and the interface of a tree description succeeds, the word anchored the corresponding tree description which is now fully instantiated.
To this end, in addition to the tools to merge the different kinds of lexicons, Bruno Guillaume and Sylvain Pogodalla have developed a tool
This tool is also used in the concordancers we provide
Two students at École des Mines (Damien Auricchio and Nelson Da Silva) also worked on factorizing morphological informations of flexed forms and on comparison of UNITEX
A development environment for ACGs is being developed by Bruno Guillaume, Philippe de Groote and Sylvain Pogodalla. The main features are the abilities to read signatures and lexicons and to realize object terms from abstract ones. This new version integrate the ability to use features in types. Parsing (to build abstract terms from object terms) and example grammars are being developed.
The papers , show the ongoing development of the theory of proof denotations for classical propositional logic pursued by François Lamarche and Lutz Straßburger. The first paper presents two concrete proof net models, that differ by the semiring of coefficients which is used to tally—much like ordinary axiom links in linear logic—how axioms are used in a proof. The model based on , the ordinary integers, counts the number of times an axiom is used, but its cutelimination process is not confluent. This desirable property is obeyed by the other model, based on the semiring of Booleans, which displays only the presence or absence of an axiom. These models have surprising properties with regards to what people have always expected about the relationship between proof denotations and computations (the ``CurryHoward isomorphism''). They also are intimately related to complexity problems on Boolean satisfiability.
The second paper is a complete study of the categorytheoretical properties of the based model. A hierarchy of axioms is proposed to give several possible answer to the question "what is a categorical model of Boolean logic", and the model is shown to be the free category (with atomic formulas as generators) for the right choice of axioms among this lot. This paper's approach has some things in common with the work of Führmann and Pym (who start with completely different concrete models), but it also shows several important differences, in particular the avoidance of any 2categorical structure in the axiomatization.
We have studied the productfree associative Lambek calculus extended with a structural modality à la Girard, which allows the left structural rules (weakening, contraction, and exchange) to be performed in a controlled way. In particular, we have shown that any recursively enumerable language can be described by a categorial grammar based on this calculus .
We have studied the expressive power of the Abstract Categorial Grammars (ACG) by showing how to represent several grammatical formalisms as ACGs, including Tree Adjunction Grammars, and Linear Context Free Rewriting Systems .
We have studied some of the languagetheoretic properties of the ACGs, such as the decidability and complexity of membership, universal membership, and emptiness. In particular, we have established the NPcompleteness of membership for arbitrary lexicalized ACGs. We have also shown that this same problem is polynomial for secondorder ACGs, by developing Earleylike parsing algorithms .
Joseph Le Roux has adapted the socalled Earley parsing algorithm to Interaction Grammars. Although the parsing problem is NPComplete, this tabular algorithm lets us reuse common material between the different parses of a sentence. That algorithm will soon be implemented in Leopar to test the actual improvement on real corpora.
For French, there exists to date no reference lexicon that would contain detailed extensive subcategorisation information (that is, information about the complements of natural language predicative items such as verbs, deverbal nouns and predicative adjectives).
In the papers and , Claire Gardent (Langue et Dialogue team) Bruno Guillaume, Guy Perrier and Ingrid Falk (ATILF) propose a method for producing such a syntactical lexicon from the LADL tables (Maurice Gross' grammar lexicon in other words).
LADL tables provides a systematic description of the syntactic properties of the functors of French namely, verbs, predicative nouns and adjectives. The subcategorisation information contained in this lexicon is both detailed and extensive.
Although the LADL tables are rich in content, their current format makes them difficult to use in NLP application. The reasons for this are threefold:
The format itself is non standard. In NLP applications, subcategorisation information is standardly gathered within a syntactic lexicon which associates with each predicative item the set of its possible subcategorisation frames. Further, subcategorisation frames are usually represented by a set of feature structures where each element in the set encode the linguistic properties either of the verb or one of the argument occuring in the frame being described. To be easily usable by NLP applications, it is highly desirable that such a syntactic lexicon be derived from the LADL tables.
The structure of the tables is either implicit in the headings or altogether absent (in the electronic version available). For instance, the dependency between columns is not marked; subset of columns that describe atomic disjunction needs to be automatically recovered from the fact that the columns are adjacent and share the same feature in their heading.
The headings are non standard and need to be translated in feature structure specification that are more in line with current practice in syntactic annotation.
We propose a method for extracting from the LADL tables, an NLP oriented syntactic lexicon. In essence, this method aims at making the table structure explicit and at translating the headings into standard practice feature structure notation. Specifically, it consists in the following three steps:
For each table, a graph is (manually) produced which represents the interpretation of the table. This graph makes the table structure explicit and translates the headings into path equations.
A graph traversal algorithm is specified such that, given a graph and a table, it produces for each entry in that table the set of subcategorisation frames associated by the table with that entry. The resulting lexicon is called a LADLlexicon and closely reflects the content of the LADL table. Some of the information it contains is not currently used by most NLP tools in particular, parsers and surface realisers.
A simplification algorithm is specified such that given a LADLlexicon, it produces an NLPlexicon. The NLP lexicon is a simplified version of the LADLlexicon where only features relevant for parsing/generating are preserved and which only partially reflects the content of the LADL table. It is with this lexicon that NLP is expected to proceed.
By using XMG, Guy Perrier has developed an interaction grammar for French. The methodology is inspired by Benoit Crabbé, who has developed a large French TAG .
The source grammar is composed of 312 classes organized in a inheritance hierarchy with two operators of conjunction and disjunction. The leaves of the hierarchy describe elementary phenomena of the grammar. Conjunctions and disjunctions express two ways of representing complex phenomena: for instance, a particular diathesis for a verb can result from the conjunction of classes representing specific realizations of its aguments and the realization of a particular predicate argument structure can be expressed by the disjunction of the classes representing the different diatheses.
The compiled grammar is composed of 710 tree descriptions mainly covering the following phenomena of the French syntax:
most subcategorisation frames for verbs, predicative adjectives and nouns,
active, passive, middle and reflexive diatheses combined with personal and impersonal subject constructions,
grammatical words and related syntactic constructions (clitics, personal, relative and interrogative pronouns, complementizers, prepositions, negations, auxiliary verbs ...),
some phenomena hard to model such as: piedpiping in relative and interrogative clauses, islands for whextraction, long distance dependencies related to negative expressions ( ``ne...aucun'', ``ne...personne''), past participle agreement in presence of the auxiliary ``avoir'', control of the subject for the infinitives ...
The grammar is in the process of being evaluated on the TSNLP test suite .
The goal is to determine and to guaranty resources by static analysis which are necessary to run a system. By resources, we mean heap memory, stack size, size of the function output values, the runtime of a program...One of the originality of our approach is that our ideas are taken from logics, type theory and termination methods. This field of research is called implicit computational complexity.
More precisely, our goals split into two complementary points. The first point concerns quasiinterpretations. See the survey
. Our objectives are to try to demonstrate that this approach is feasible. For this, it is necessary to have
heuristic to find program quasiinterpretations, see
,
. Then, we try to extend the quasiinterpretations methods in order to analyze more algorithms in a more easily way.
Thus, we have introduce the concept of ``supinterpretation''. And the result has just been accepted at Flops06. The long term goal is to have methods to analyze functional and imperative programs. Another directions that we are currently exploring is to use automata theory to predict
resources. Lastly, in a more short term goal, we work inside a Pessoa PAI with R. Kähle and I. Oitavem to characterize small complexity classes like
NC^{k}. We also work on the more fundamental question of understanding the BSS computational models over real numbers, see
,
.
This year we make a research turn by considering computer virus, see , , . Indeed, attacks of the type denial of services for which the memory resource is critical may cause for example a ``buffer overflow''. So, we could expect that static analysis that we develop in the context of programming language of high level with quasiintepretations or low level with the methods developed by Marion & Moyen based on Petri nets and linear algebra could apply.
Calligramme is part of the ``Ingénierie des langues, du document, de l'information scientifique et culturelle'' theme of the ``contrat de plan ÉtatRégion''. Calligramme's contributions range over the syntactical and semantical analysis of natural language, the building of wide coverage lexical resources, and the development of software specialized for those tasks;
Calligramme is part of the ``Qualité et sûreté des Logiciels (QSL)'' theme of the ``contrat de plan ÉtatRégion''. JeanYves Marion is head of the QSL theme;
Web page at http://qsl.loria.fr
Calligramme is involved in the ACI Demonat, in section ``Nouvelles interfaces des mathématiques'', together with the ``Logique'' group of ``Université de Savoie'' and the ``TALaNa'' team of ``Université Paris 7''. The project concerns the parsing and the checking of mathematical proofs written in natural language.
Calligramme is involved in the ACI CRISS, in section ``Sécurité informatique''. Its purpose, which can be read from the full title, is ``Contrôle de ressources et d'interfaces pour les systèmes synchrones''. It is headed by Roberto Amadio at the University of Marseille, and the coordinator on Calligramme's side is JeanYves Marion.
Web page at http://www.pps.jussieu.fr/~amadio/Criss/criss.html
This ``nouvelles interfaces des mathématiques'' ACI (20032006) regroups several research teams in both mathematics and computer science and is concerned, as its name implies, with the application to computer science of techniques developed for modern geometry. It is headed by Thomas Ehrhard at the cnrsin Marseille, and the coordinator on Calligramme's side is JeanYves Marion.
Web page at http://iml.univmrs.fr/~ehrhard/geocal/geocal.html
Headed by Eric Goubault, this threeyear action (starting in November) is the direct descendent of Géocal and a smaller ACI that ended in 2005. Its aims are the study and development of algebraic invariants of computation, inspired by traditional homology and homotopy in algebraic topology. The coordinator on Calligramme's side is François Lamarche.
Calligramme, through JeanYves Marion, is a participant in the Ministry of Industry RNTL project Averroes.
Web page: http://wwwverimag.imag.fr/AVERROES/
Calligramme is involved in the LexSynt project. Thirteen Frenchspeaking research teams work on this project. It aims at developping a syntactic lexicon with large coverage for French. In order to be usable in various NLP applications, this lexicon is independent of any grammatical formalism.
Web page: http://lexsynt.inria.fr
Calligramme is involved in the european network CoLogNET (Computational Logic Network) on the themes: logic methodology and foundational tools, logic and natural language processing.
Philippe de Groote and Sylvain Salvati visited Makoto Kanazawa (NII, Tokyo) from February 12th to February 27th.
Philippe de Groote and Sylvain Pogodalla visited Carl Pollard (Ohio State University) from November 30th to December 6th.
Ryo Yoshinaka (Makoto Kanazawa's PhD student) visited the Calligramme Project from December 8th to December 21th.
Guillaume Bonfante is the vice president of the hiring committee, section 27, of the INPL, since April 2003.
Guillaume Bonfante is an elected member of the scientific council of the INPL since July 2003.
Guillaume Bonfante is a member of the engineering part of the Comipers hiring committee at LORIA.
Adam Cichon was elected member of the ``Conseil National des Universités'' (CNU), section 27.
Philippe de Groote is President of the INRIALorraine Projects Committee (starting September 2004), and a member of INRIA's evaluation board.
Philippe de Groote is a member of the LORIA management board, and of the LORIA laboratory council.
Philippe de Groote is an associate editor of the journal HigherOrder and Symbolic Computation. He belongs to the editorial board of the series Papers in Formal Linguistics and Logic(Bulzoni, Roma), and Cahiers du Centre de Logique(AcademiaBruylant, LouvainlaNeuve).
Philippe de Groote was member of the program committees of LACL'05, and UNIF'05.
François Lamarche is member of the Bureau of the Département de Formation Doctorale in Computer Science of the IAEM doctoral school.
François Lamarche heads the research (theses, postdocs and ingénieurs spécialistes) section of the Comipers hiring commitee at LORIA.
François Lamarche was chairman (both Program Committe and Organization Committee) of the ``Structures and Deductions 2005'' (SD05) http://www.prooftheory.org/sd05/which was held in Lisbon on July 16–17, as a satellite worshop of the ICALP 2005 international conference. The worshop's theme was the emergence of new methods in proof theory; the proceedings are available at http://www.ki.inf.tudresden.de/~paola/SD05/SD05Proc.pdf.
JeanYves Marion is member of the steering committee of the International workshop on Logic and Computational Complexity (LCC).
JeanYves Marion is member of the hiring committee (CS) at the University of Metz, section 27, since Sept. 2004.
JeanYves Marion is member of the hiring committee at INPL (Professors and Lecturers), section 27, since February 2002.
JeanYves Marion was elected to the scientific council of INPL in July 2003 and member of the board.
JeanYves Marion initiated and organizes the monthly QSL seminars http://qsl.loria.fr. Every seminar gathers between 10 and 40 participants. There were 22 seminars since January 2003.
Guy Perrier is a member of the editorial board of the revue Traitement Automatique des Langues.
Guy Perrier is a member of the Program Committee of the conference TALN'2006.
Guy Perrier is a member of the Bureau of the Département de Formation Doctorale in Computer Science of the IAEM doctoral school.
JeanYves Marion is in charge of the option ``Ingénierie des systèmes informatiques'' at École des Mines starting in September.
JeanYves Marion took part in the creation of the formation in computational biology at École des Mines and is in charge of the course on ``Bases et banques de données''.
Guy Perrier heads the specialization ``Traitement Automatique des Langues'' which is common to the masters in computer science and cognitive sciences of the universities Nancy2 and Henri Poincaré.
Guy Perrier is in charge of the organization of the course on tools and algorithms for the parsing of natural languages, which he is teaching with Bertrand Gaiffe in the master's specialization ``Traitement Automatique des Langues''.
Philippe de Groote and Sylvain Pogodalla gave a course on ACGs at the ESSLLI 2005 (European Summer School in Logic, Language and Information), in August in Edinburgh.
Philippe de Groote is teaching the course ``Sémantique computationnelle'' of the Nancy master specialization ``Traitement Automatique des Langues''.
Philippe de Groote and Gérard Huet are teaching the course `` Structures Informatiques et Logiques pour la Modélisation Linguistique'' of the ``Master Parisien de Recherche en Informatique''.
Philippe de Groote has been supervising the thesis work of Sylvain Salvati.
Guy Perrier is supervising the thesis work of Joseph Le Roux.
JeanYves Marion is supervising the thesis work of Romain Péchoux from September 2004.
JeanYves Marion and Simona Ronchi Della Rocca (Turino university) are cosupervising the thesis work of Marco Gaboardi.
JeanYves Marion and Guillaume Bonfante are cosupervising the thesis work of Mathieu Kaczmarek.
Sylvain Pogodalla advised three third year students at École des Mines (Amal Laouaj, Guillaume Princelle et Lisa Rouhban) for a two month internship devoted to studying french tokenization.
Bruno Guillaume advised two second year students at École des Mines (Damien Auricchio and Nelson Da Silva) for a three month internship devoted to factorizing morphological informations of flexed forms and comparing morphological lexicons.
Bruno Guillaume advised two second year students at École des Mines (Vincent Domange and Romain Jacquier) for a three month internship devoted to interfacing lexicons and anonymous grammars.
Philippe de Groote was jury member for Sylvain Salvati's thesis, Nancy, June 13.
Philippe de Groote was jury president for Benoît Crabbé's thesis, Nancy, June 14.
Philippe de Groote was referee and jury member for Hugo Herbelin's HDR, Paris 11, December 7.
JeanYves Marion was jury member for Clara Bertolissi's thesis, Nancy October 28.
JeanYves Marion was jury member for Julien Fondrevelle's thesis, Nancy November 10.
Sylvain Salvati defended his thesis on June 13, 2005 (jury: Éric de la Clergerie, Alexander Dikovsky, Philippe de Groote, Dale Miller, Glynn Morrill, Karl Tombre).
Bruno Guillaume and Guy Perrier attended the "Journée ATALA : Interface lexiquegrammaire et lexiques syntaxiques et sémantiques" on March 12, in Paris. They presented a talk and a poster.
Philippe de Groote, Bruno Guillaume, Sylvain Pogodalla and Sylvain Salvati attended the Demonat workshop in Nancy in April..
Philippe de Groote attended the LACL'05 conference, in Bordeaux, April 2830.
JeanYves Marion gave an invited talk Data tiering as a complexity tool which jumps from discrete to real computationat the International Workshop "Computations on the continuum", June 2005, Lisbon.