ALPAGE - 2016 - Rapport annuel d'activité

ALPAGE

ALPAGE - 2016

Project-Team Alpage

Members

Overall Objectives

Research Program

Highlights of the Year

New Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: New Software and Platforms

MElt

Maximum-Entropy lexicon-aware tagger

Keyword: Part-of-speech tagger

Functional Description

MElt is a freely available (LGPL) state-of-the-art sequence labeller that is meant to be trained on both an annotated corpus and an external lexicon. It was developed by Pascal Denis and Benoît Sagot within the Alpage team, a joint Inria and Université Paris-Diderot team in Paris, France. MElt allows for using multiclass Maximum-Entropy Markov models (MEMMs) or multiclass perceptrons (multitrons) as underlying statistical devices. Its output is in the Brown format (one sentence per line, each sentence being a space-separated sequence of annotated words in the word/tag format).

MElt has been trained on various annotated corpora, using Alexina lexicons as source of lexical information. As a result, models for French, English, Spanish and Italian are included in the MElt package.

MElt also includes a normalization wrapper aimed at helping processing noisy text, such as user-generated data retrieved on the web. This wrapper is only available for French and English. It was used for parsing web data for both English and French, respectively during the SANCL shared task (Google Web Bank) and for developing the French Social Media Bank (Facebook, twitter and blog data).

Contact: Benoît Sagot
URL: https://www.rocq.inria.fr/alpage-wiki/tiki-index.php?page=MElt

Previous |

Home | Next next