Section: Application Domains

Metric learning for natural language processing

The analysis of large scale datasets to perform un-supervised (clustering) and supervised (classification, regression) learning requires the design of advanced models to capture the geometry of the input data. We believe that optimal transport is a key tool to address this problem because (i) many of these datasets are composed of histograms (social network activity, image signatures, etc.) (ii) optimal transport makes use of a ground metric that enhances the performances of classical learning algorithms, as illustrated for instance in  [114] .

Some of the theoretical and numerical tools developed by our team, most notably Wasserstein barycenters  [46] , [71] , are now becoming mainstream in machine learning  [67] , [114] . In its simplest (convex) form where one seeks to only maximize pairwise wasserstein distances, metric learning corresponds to the congestion problem studied by G. Carlier and collaborators  [102] , [74] , and we will elaborate on this connection to perform both theoretical analysis and develop numerical schemes (see for instance our previous work  [64] ).

We aim at developing novel variational estimators extending classification regression energies (SVM, logistic regression  [129] ) and kernel methods (see  [173] ). One of the key bottleneck is to design numerical schemes to learn an optimal metric for these purpose, extending the method of Marco Cuturi  [113] to large scale and more general estimators. Our main targeted applications is natural language processing. The analysis and processing of large corpus of texts is becoming a key problems at the interface between linguistic and machine learning  [50] . Extending classical machine learning methods to this field requires to design suitable metrics over both words and bag-of-words (i.e. histograms). Optimal transport is thus a natural candidate to bring innovative solutions to these problems. In a collaboration with Marco Cuturi (Kyoto University), we aim at unleashing the power of transportation distances by performing ground distance learning on large database of text. This requires to lift previous works on distance on words (see in particular  [159] ) to distances on bags-of-words using transport and metric learning.

Figure 11. Examples of two histogram (bag-of-words) extracted from the congress speech of US president. In this application, the goal is to infer a meaningful metric on the words of the english language and lift this metric to histogram using OT technics.
IMG/lincoln.png IMG/obama.png