EN FR
EN FR




Bilateral Contracts and Grants with Industry
Bibliography




Bilateral Contracts and Grants with Industry
Bibliography


Section: Software

Alpage's linguistic workbench, including Sx Pipe

Participants : Benoît Sagot [correspondant] , Rosa Stern, Marion Baranes, Damien Nouvel, Virginie Mouilleron, Pierre Boullier, Éric Villemonte de La Clergerie.

See also the web page http://lingwb.gforge.inria.fr/ .

Alpage's linguistic workbench is a set of packages for corpus processing and parsing. Among these packages, the Sx Pipe package is of a particular importance.

Sx Pipe [97] is a modular and customizable chain aimed to apply to raw corpora a cascade of surface processing steps. It is used

  • as a preliminary step before Alpage's parsers (e.g., FRMG);

  • for surface processing (named entities recognition, text normalization...).

Developed for French and for other languages, Sx Pipe includes, among others, various named entities recognition modules in raw text, a sentence segmenter and tokenizer, a spelling corrector and compound words recognizer, and an original context-free patterns recognizer, used by several specialized grammars (numbers, impersonal constructions, quotations...). In 2012, Sx Pipe has received a renewed attention in four directions:

  • Support of new languages, and most notably German (although this is still at a very preliminary stage of development;

  • Analysis of unknown words, in particular in the context of the ANR project EDyLex and of the collaboration with viavoo; this involves in particular (ii) new tools for the automatic pre-classification of unknown words (acronyms, loan words...) (ii) new morphological analysis tools, most notably automatic tools for constructional morphology (both derivational and compositional), following the results of dedicated corpus-based studies;

  • Development of new local grammars for detecting new types of entities, such as chemical formulae or dimensions, in the context of the PACTE project.