EN FR
EN FR




Bilateral Contracts and Grants with Industry
Bibliography




Bilateral Contracts and Grants with Industry
Bibliography


Section: Software

Automatic construction of distributional thesauri

Participant : Enrique Henestroza Anguiano [correspondant] .

FreDist is a freely-available (LGPL license) Python package that implements methods for the automatic construction of distributional thesauri.

We have implemented the context relation approach to distributional similarity, with various context relation types and different options for weight and measure functions to calculate distributional similarity between words. Additionally, FreDist is highly flexible, with parameters including: context relation type(s), weight function, measure function, term frequency thresholds, part-of-speech restrictions, filtering of numerical terms, etc.

Distributional thesauri for French are also available, one each for adjectives, adverbs, common nouns, and verbs. They have been constructed with FreDist and use the best settings obtained in an evaluation. We use the L'Est Republicain corpus (125 million words), Agence France-Presse newswire dispatches (125 million words) and a full dump of the French Wikipedia (200 million words), for a total of 450 million words of text.