EN FR
EN FR


Section: New Software and Platforms

Extreme UGC corpus

Functional Description

The Extreme UGC corpus is French three-domain data set focusing on user-generated content, made up of noisy question headlines from a cooking forum, live game chat logs and associated forums from two popular online games (MINECRAFT and LEAGUE OF LEGENDS). Building such an out of domain corpus, allowed us to consider the limits of our current normalization approaches. Currently annotated with part-of-speech, we plan to add other annotations layers.

  • Contact: Djame Seddah