Section: Partnerships and Cooperations

International Initiatives

Inria International Labs


Associate Team involved in the International Lab:

  • Title: LEarning GOod representations for natural language processing

  • International Partner (Institution - Laboratory - Researcher):

    • University of Southern California (United States) - Theoretical and Empirical Data Science (TEDS) research group Department of Computer Science - Fei Sha

  • Start year: 2019

  • See also: https://team.inria.fr/lego/

  • LEGO lies in the intersection of Machine Learning and Natural Language Processing (NLP). Its goal is to address the following challenges: what are the right representations for text data and how to learn them in a robust and transferable way? How to apply such representations to solve real-world NLP tasks, specifically in scenarios where linguistic resources are scarce? The past years have seen an increasing interest in learning continuous vectorial embeddings, which can be trained together with the prediction model in an end-to-end fashion, as in recent sequence-to-sequence neural models. However, they are unsuitable to low-resource languages as they require massive amounts of data to train. They are also very prone to overfitting, which makes them very brittle, and sensitive to bias present in the original text as well as to confounding factors such as author attributes. LEGO strongly relies on the complementary expertise of the two partners in areas such as representation learning, structured prediction, graph-based learning, multi-task/transfer learning, and statistical NLP to offer a novel alternative to existing techniques. Specifically, we propose to investigate the following two research directions: (a) optimize the representations to make them robust to bias and adversarial examples, and (b) learn transferable representations across languages and domains, in particular in the context of structured prediction problems for low-resource languages. We will demonstrate the usefulness of the proposed methods on several NLP tasks, including multilingual dependency parsing, machine translation, question answering and text summarization.

Inria Associate Teams Not Involved in an Inria International Labs

  • North-European Associate Team PAD-ML: Privacy-Aware Distributed Machine Learning.

  • International Partner: the PPDA team at the Alan Turing Institute.

  • Start year: 2018

  • In the context of increasing legislation on data protection (e.g., the recent GDPR), an important challenge is to develop privacy-preserving algorithms to learn from datasets distributed across multiple data owners who do not want to share their data. The goal of this joint team is to devise novel privacy-preserving, distributed machine learning algorithms and to assess their performance and guarantees in both theoretical and practical terms.