Section: New Software and Platforms
The Bonsai PCFG-LA parser
Participants : Marie-Hélène Candito [correspondant] , Djamé Seddah, Benoit Crabbé.
Web page:
http://alpage.inria.fr/statgram/frdep/fr_stat_dep_parsing.html
Alpage has developed as support of the research papers [75] , [67] , [68] , [122] a statistical parser for French, named Bonsai, trained on the French Treebank. This parser provides both a phrase structure and a projective dependency structure specified in [66] as output. This parser operates sequentially: (1) it first outputs a phrase structure analysis of sentences reusing the Berkeley implementation of a PCFG-LA trained on French by Alpage (2) it applies on the resulting phrase structure trees a process of conversion to dependency parses using a combination of heuristics and classifiers trained on the French treebank. The parser currently outputs several well known formats such as Penn treebank phrase structure trees, Xerox like triples and CONLL-like format for dependencies. The parsers also comes with basic preprocessing facilities allowing to perform elementary sentence segmentation and word tokenisation, allowing in theory to process unrestricted text. However it is believed to perform better on newspaper-like text.
The parser is available under a GPL license.