Section: Application Domains
Shallow processing of e-mails
Participants : Benoît Sagot, Laurence Danlos.
Shallow processing is one of the most important NLP application domains. This includes, in particular, detecting named entities in a broad sense (person names, organization names, locations, addresses, date and time mentions, and others), with many possible purposes, such as text normalization and even anonymization, but more importantly for extracting events and other kinds of structured information from text. This is what the new company Kwaga is trying to do on e-mails, challenging difficulties related to the high level of noise that characterizes e-mail corpora (spelling mistakes, shortenings, inter-e-mail structure...). In 2009-2010, an ARITT contract has been set up to try and study the usability of Alpage's Sx Pipe shallow processing chain for part of this purpose.