EN FR
EN FR
New Software and Platforms
Bilateral Contracts and Grants with Industry
Bibliography
New Software and Platforms
Bilateral Contracts and Grants with Industry
Bibliography


Section: New Results

Graph-based Text Analytics

Participants: Fragkiskos Malliaros (in collaboration with Konstantinos Skianis and Michalis Vazirgiannis, École Polytechnique)

Text categorization is a core task in a plethora of text mining applications. In our work, contrary to the traditional Bag-of-Words approach, we have considered the Graph-of-Words model in which each document is represented by a graph that encodes relationships between the different terms. Based on this formulation, we treat the term weighting task as a node ranking problem; the importance of a term is determined by the importance of the corresponding node in the graph, using node centrality criteria. We have also introduced novel graph-based weighting schemes by enriching graphs with word-embedding distances, in order to reward or penalize the importance of semantically close terms [39]. Our methods produce more discriminative feature weights for text categorization, outperforming existing frequency-based criteria – highlighting also the importance of graph-based methods in text analytics and natural language processing in general.