Section: Software and Platforms
Leopar
Participants : Bruno Guillaume [correspondent] , Guy Perrier, Tatiana Ekeinhor.
Software description
Leopar is a parser for natural languages which is based on the formalism of Interaction Grammars [40] . It uses a parsing principle, called “electrostatic parsing” which consists in neutralizing opposite polarities. A positive polarity corresponds to an available linguistic feature and a negative one to an expected feature.
Parsing a sentence with an Interaction Grammar consists in first selecting a lexical entry for each of its words. A lexical entry is an underspecified syntactic tree, a tree description in other words. Then, all selected tree descriptions are combined by partial superposition guided by the aim of neutralizing polarities: two opposite polarities are neutralized by merging their support nodes. Parsing succeeds if the process ends with a minimal and neutral tree. As IGs are based on polarities and under-specified trees, Leopar uses some specific and non-trivial data-structures and algorithms.
The electrostatic principle has been intensively considered in Leopar. The theoretical problem of parsing IGs is NP-complete; the nondeterminism usually associated to NP-completeness is present at two levels: when a description for each word is selected from the lexicon, and when a choice of which nodes to merge is made. Polarities have shown their efficiency in pruning the search tree:
-
In the first step (tagging the words of the sentence with tree descriptions), we forget the structure of descriptions, and only keep the bag of their features. In this case, parsing inside the formalism is greatly simplified because composition rules reduce to the neutralization of a negative feature-value pair by a dual positive feature-value pair . As a consequence, parsing reduces to a counting of positive and negative polarities present in the selected tagging for every pair : every positive occurrence counts for and every negative occurrence for , the sum must be 0.
-
Again in the tagging step, original methods were developped to filter out bad taggings. Each unsaturated polarity in the grammar induces constraints on the set of contexts in which it can be used: the unsaturated polarity must find a companion (i.e. a tree description able to saturated it); and the set of companions for the polarity can be computed statically from the grammar. Each lexical selection which contains an unsaturated polarity without one of its companions can be safely removed.
-
In the next step (node-merging phase), polarities are used to cut off parsing branches when their trees contain too many non neutral polarities.
Current state of the implementation
Leopar is presented and documented at http://leopar.loria.fr ; an online demonstration page can be found at http://leopar.loria.fr/demo .
It is open-source (under the CECILL License http://www.cecill.info ) and it is developed using the InriaGforge platform (http://gforge.inria.fr/projects/semagramme/ )
The main features of current software are:
-
interactive parsing (the user chooses the couple of nodes to merge),
-
visualization of grammars produced by XMG-2 or of sets of description trees associated to some word in the linguistic resources.
One of the difficulties with symbolic parsing is that several solution can be produced for a single sentence and we want te be able to rank them. Tatiana Ekeinhor, during her second year Master Intership (from February to June 2013), implemented a ranker based on statistical techniques. Using the Sequoia TreeBank as a training corpus, she obtained an improvement of the system compared to the handcrafted rules.