Machine learning is a recent scientific domain, positioned between applied mathematics, statistics and computer science. Its goals are the optimization, control, and modelisation of complex systems from examples. It applies to data from numerous engineering and scientific fields (e.g., vision, bioinformatics, neuroscience, audio processing, text processing, economy, finance, etc.), the ultimate goal being to derive general theories and algorithms allowing advances in each of these domains. Machine learning is characterized by the high quality and quantity of the exchanges between theory, algorithms and applications: interesting theoretical problems almost always emerge from applications, while theoretical analysis allows the understanding of why and when popular or successful algorithms do or do not work, and leads to proposing significant improvements.

Our academic positioning is exactly at the intersection between these three aspects—algorithms, theory and applications—and our main research goal is to make the link between theory and algorithms, and between algorithms and high-impact applications in various engineering and scientific fields, in particular computer vision, bioinformatics, audio processing, text processing and neuro-imaging.

Machine learning is now a vast field of research and the team focuses on the following aspects: supervised learning (kernel methods, calibration), unsupervised learning (matrix factorization, statistical tests), parsimony (structured sparsity, theory and algorithms), and optimization (convex optimization, bandit learning). These four research axes are strongly interdependent, and the interplay between them is key to successful practical applications.

Rodolphe Jenatton (former PhD student, graduated in 2011) received two thesis prizes (Fondation Hadamard and AFIA).

Francis Bach received the Inria young researcher prize.

Monograph published in the collection *Foundations and Trends in Machine Learning*: “Optimization with sparsity-inducing penalties”.

This part of our research focuses on methods where, given a set of examples of input/output pairs, the goal is to predict the output for a new input, with research on kernel methods, calibration methods, and multi-task learning.

We focus here on methods where no output is given and the goal is to find structure of certain known types (e.g., discrete or low-dimensional) in the data, with a focus on matrix factorization, statistical tests, dimension reduction, and semi-supervised learning.

The concept of parsimony is central to many areas of science. In the context of statistical machine learning, this takes the form of variable or feature selection. The team focuses primarily on structured sparsity, with theoretical and algorithmic contributions (this is the main topic of the ERC starting investigator grant awarded to F. Bach).

Optimization in all its forms is central to machine learning, as many of its theoretical frameworks are based at least in part on empirical risk minimization. The team focuses primarily on convex and bandit optimization, with a particular focus on large-scale optimization.

Machine learning research can be conducted from two main perspectives: the first one, which has been dominant in the last 30 years, is to design learning algorithms and theories which are as generic as possible, the goal being to make as few assumptions as possible regarding the problems to be solved and to let data speak for themselves. This has led to many interesting methodological developments and successful applications. However, we believe that this strategy has reached its limit for many application domains, such as computer vision, bioinformatics, neuro-imaging, text and audio processing, which leads to the second perspective our team is built on: Research in machine learning theory and algorithms should be driven by interdisciplinary collaborations, so that specific prior knowledge may be properly introduced into the learning process, in particular with the following fields:

Computer vision: objet recognition, object detection, image segmentation, image/video processing, computational photography. In collaboration with the Willow project-team.

Bioinformatics: cancer diagnosis, protein function prediction, virtual screening. In collaboration with Institut Curie.

Text processing: document collection modeling, language models.

Audio processing: source separation, speech/music processing. In collaboration with Telecom Paristech.

Neuro-imaging: brain-computer interface (fMRI, EEG, MEG). In collaboration with the Parietal project-team.

SPAMS (SPArse Modeling Software) is an optimization toolbox for solving various
sparse estimation problems: dictionary learning and matrix factorization, solving sparse decomposition prob-
lems, solving structured sparse decomposition problems. It is developped by Julien Mairal (former Willow PhD student, co-advised by F. Bach and J. Ponce), with the collaboration of Francis Bach (Inria), Jean
Ponce (Ecole Normale Supérieure), Guillermo Sapiro (University of Minnesota), Rodolphe Jenatton (Inria)
and Guillaume Obozinski (Inria). It is coded in C++ with a Matlab interface. Recently, interfaces for R
and Python have been developed by Jean-Paul Chieze (Inria). Currently 650 downloads and between 1500
and 2000 page visits per month. See http://

SiGMa - Simple Greedy Matching: a tool for aligning large knowledge-bases

Version 1. Webpage: http://

The tool SiGMa (Simple Greedy Matching) is a knowledge base alignment tool implemented in Python. It takes as input two knowledge bases, each represented as a list of triples of (entity, relationship, entity), in addition to a partial alignment between the relationships from one knowledge base to the other, and gives as output an ordered list of proposed entity matches between the two knowledge base (where the order corresponds heuristically to a notion of certainty about these matches). The matching decisions are made in a greedy fashion, combining information about the relationship graph as well as a pairwise similarity scores defined between the entities. The code handles various sources of information to be used for this score, such as a similarity defined on strings, dates, and other entity properties – and gives a few options to the user.

We also provide two large-scale knowledge base alignment benchmark datasets with tens of thousands of ground truth pairs: YAGO aligned to IMDb as well as Freebase aligned to IMDb.

Participants outside of Sierra: Konstantina Palla, Alex Davies, Zoubin Ghahramani (Machine Learning Group, Department of Engineering, University of Cambridge); Gjergji Kasneci, Thore Graepel (Microsoft Research Cambridge)

minFunc is a Matlab function for unconstrained optimization of differentiable real-valued multivariate functions using line-search methods. It uses an interface very similar to the Matlab Optimization Toolbox function fminunc, and can be called as a replacement for this function. On many problems, minFunc requires fewer function evaluations to converge than fminunc (or minimize.m). Further it can optimize problems with a much larger number of variables (fminunc is restricted to several thousand variables), and uses a line search that is robust to several common function pathologies.

The default parameters of minFunc call a quasi-Newton strategy, where limited-memory BFGS updates with Shanno-Phua scaling are used in computing the step direction, and a bracketing line-search for a point satisfying the strong Wolfe conditions is used to compute the step direction. In the line search, (safeguarded) cubic interpolation is used to generate trial values, and the method switches to an Armijo back-tracking line search on iterations where the objective function enters a region where the parameters do not produce a real valued output (i.e. complex, NaN, or Inf).
See http://

The prettyPlot function is a simple wrapper to Matlab's plot function for quickly making nicer-looking plots. Here are the features:
Made the default line styles bigger, and the default fonts nicer.
Options are passed as a structure, instead of through plot's large number of different functions.
You can pass in cell arrays to have lines of different lengths.
You can pass an

SegAnnot: an R package for fast segmentation of annotated piecewise constant signals. Tech report and R package. Standard segmentation models for piecewise constant signals do not always agree with an expert's visual interpretation of the signal, as encoded using a set of annotations. This R package implements a dynamic programming algorithm which can be used to quickly find a segmentation model in agreement with expert annotations.
Collaboration with Guillem Rigaill (Inria - AgroParisTech). See http://

Collaboration with Alain Celisse (University Lille 1; Inria Lille, MODAL team) and Zaïd Harchaoui (Inria Grenoble, LEAR team).

Collaboration with Matthieu Lerasle (CNRS, University Nice Sophia Antipolis).

In the course of the year 2011-2012 two articles where submitted and accepted in international workshops. The first published article, **Improved brain pattern recovery through ranking approaches** () was presented at the 2nd International Workshop on Pattern Recognition in NeuroImaging in London, July 2012 and proposes a new approach for the problem of estimating the coefficients of a generalized linear model with monotonicity constraint. For this, we explore the use of ranking techniques, which are popular in the context of information retrieval but novel for medical imaging applications.

The second published article, **Learning to rank from medical imaging data** () uses the same techniques as the previous article to solve a more fundamental problem, that is, to predict a quantitative (and potentially non-linear) variable from a set of noisy measurements. We show on simulations and two fMRI datasets
that this approach is able to predict the correct ordering on pairs of images, yielding higher prediction accuracy than standard regression and
multiclass classification techniques.

Collaboration with the Parietal project-team (A. Gramfort, B. Thirion, G. Varoquaux)

The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and answer complex queries. However, the efficient alignment of large-scale knowledge bases still poses a considerable challenge. In , we present Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts. SiGMa is an iterative propagation algorithm which leverages both the structural information from the relationship graph as well as flexible similarity measures between entity properties in a greedy local search, thus making it scalable. Despite its greedy nature, our experiments indicate that SiGMa can efficiently match some of the world's largest knowledge bases with high precision. We provide additional experiments on benchmark datasets which demonstrate that SiGMa can outperform state-of-the-art approaches both in accuracy and efficiency.

Collaboration with Konstantina Palla, Alex Davies, Zoubin Ghahramani (Machine Learning Group, Department of Engineering, University of Cambridge); Gjergji Kasneci (Max Planck Institut fur Informatik); Thore Graepel (Microsoft Research Cambridge).

Collaboration with Martin Jaggi (Centre de Mathématiques Appliquées, Ecole Polytechnique); Patrick Pletscher (Machine Learning Laboratory, ETH Zurich).

Bottom-up, fully unsupervised segmentation remains a daunting challenge for computer vision. In the cosegmentation context, on the other hand, the availability of multiple images assumed to contain instances of the same object classes provides a weak form of supervision that can be exploited by discriminative approaches. Unfortunately, most existing algorithms are limited to a very small number of images and/or object classes (typically two of each). In , we propose a novel energy-minimization approach to cosegmentation that can handle multiple classes and a significantly larger number of images. The proposed cost function combines spectral- and discriminative-clustering terms, and it admits a probabilistic interpretation. It is optimized using an efficient EM method, initialized using a convex quadratic approximation of the energy. Comparative experiments show that the proposed approach matches or improves the state of the art on several standard datasets.

Collaboration with the Willow project-team (J. Ponce).

Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relations between entities. While there is a large body of work focused on modeling these data, modeling these multiple types of relations jointly remains challenging. Further, existing approaches tend to breakdown when the number of these types grows. In , we propose a method for modeling large multi relational datasets, with possibly thousands of relations. Our model is based on a bilinear structure, which captures various orders of interaction of the data, and also shares sparse latent factors across different relations. We illustrate the performance of our approach on standard tensor-factorization datasets where we attain, or outperform, state-of-the-art results. Finally, a NLP application demonstrates our scalability and the ability of our model to learn efficient and semantically meaningful verb representations.

Collaboration with R. Jenatton (CMAP, Ecole Polytechnique) and Antoine Bordes (CNRS, Université de Technologie de Compiégne).

Collaboration with C. Févotte (Laboratoire traitement et communication de l'information (LTCI), CNRS: UMR5141 - Institut Télécom - Télécom ParisTech).

Google Research Award: “Large scale adaptive machine learning with finite data sets”.

S. Arlot, Membre du projet ANR Calibration

Titre: Statistical calibration

Coordinator: University Paris Dauphine

Leader: Vincent Rivoirard

Other members: 34 members, mostly among CEREMADE (Paris Dauphine), Laboratoire Jean-Alexandre Dieudonné (Université de Nice) and Laboratoire de Mathématiques de l'Université Paris Sud

Instrument: ANR Blanc

Duration: Jan 2012 - Dec 2015

Total funding: 240 000 euros

Title: New statistical approaches to computer vision and bioinformatics

Coordinator: Ecole Normale Supérieure (Paris)

Leader of the project: Sylvain Arlot

Other members: J. Sivic (Willow project-team, ENS), A. Celisse (University Lille 1), T. Mary-Huard (AgroParisTech), E. Roquain and F. Villers (University Paris 6).

Instrument: ANR, Young researchers Program

Duration: Sep 2009 - Aug 2012

Total funding: 70000 Euros

Abstract: The Detect project aims at providing new statistical approaches for detection problems in computer vision (in particular, detecting and recognizing human actions in videos) and bioinformatics (e.g., simultaneously segmenting CGH profiles). These problems are mainly of two different statistical nature: multiple change-point detection (i.e., partitioning a sequence of observations into homogeneous contiguous segments) and multiple tests (i.e., controlling a priori the number of false positives among a large number of tests run simultaneously).

Title: SIERRA – Sparse structured methods for machine learning

Type: IDEAS

Instrument: ERC Starting Grant (Starting)

Duration: December 2009 - November 2014

Coordinator: Inria (France)

See also: http://

Abstract: Machine learning is now a core part of many research domains, where the abundance of data has forced researchers to rely on automated processing of information. The main current paradigm of application of machine learning techniques consists in two sequential stages: in the representation phase, practitioners first build a large set of features and potential responses for model building or prediction. Then, in the learning phase, off-the-shelf algorithms are used to solve the appropriate data processing tasks. While this has led to significant advances in many domains, the potential of machine learning techniques is far from being reached: the tenet of this proposal is that to achieve the expected breakthroughs, this two-stage paradigm should be replaced by an integrated process where the

Title: Fast Statistical Analysis of Web Data via Sparse Learning

Inria principal investigator: Francis Bach

International Partner (Institution - Laboratory - Researcher):

University of California Berkeley (United States) - EECS and IEOR Departments - Laurent El Ghaoui

Duration: 2011 - 2013

See also: http://

The goal of the proposed research is to provide web-based tools for the analysis and visualization of large corpora of text documents, with a focus on databases of news articles. We intend to use advanced algorithms, drawing from recent progresses in machine learning and statistics, to allow a user to quickly produce a short summary and associated timeline showing how a certain topic is described in news media. We are also interested in unsupervised learning techniques that allow a user to understand the difference between several different news sources, topics or documents.

Michael Jordan (U.C. Berkeley, http://

F. Bach: Journal of Machine Learning Research, Action Editor.

F. Bach: IEEE Transactions on Pattern Analysis and Machine Intelligence, Associate Editor.

F. Bach: Information and Inference, Associate Editor.

F. Bach: SIAM Journal on Imaging Sciences, Associate Editor.

G. Obozinski: Journal of Machine Learning Research, Member of the Editorial Board.

G. Obozinski: International conference on Artificial Intelligence and Statistics (AISTATS) 2012.

F. Bach: International Conference on Machine Learning, 2012.

S. Lacoste-Julien, F. Bach: Conference on Uncertainty in Artificial Intelligence, 2012.

Journals: Annals of Statistics, Machine Learning, Journal of Machine Learning Research (JMLR), IEEE Transaction on Information Theory, Transaction in Pattern Recognition and Machine Intelligence (TPAMI), Information and Inference (IMAIAI), Scandinavian Journal of Statistics (SJS), Statistics and Computing (STO), Annales de l'IHP, Annals of Statistics

Conferences: UAI, ECML, NIPS, CVPR, ICML, COLT, AISTATS.

S. Arlot is member of the board for the entrance exam in Ecole Normale Supérieure (mathematics, voie B/L).

M. Schmidt, Session Organizer at International Conference on Continuous Optimization (July 27 - August 1, 2013).

G. Obozinski, Co-organiser of the workshop “Sparsity, Dictionaries and Projections in Machine Learning and Signal Processing" at ICML 2012, Edinburgh, Scotland.
http://

F. Bach: International Conference on Machine Learning, 2012, Workshop co-chair.

F. Bach: Co-organizer of the NIPS workshop on “Analysis Operator Learning vs. Dictionary Learning: Fraternal Twins in Sparse Modeling”, https://

Licence : S. Arlot, F. Bach, G. Obozinski, “Apprentissage statistique”, 35h, Ecole Normale Supérieure, Filière “Math-Info”, première année.

Licence: G. Obozinski, Introduction aux modèles graphiques (4h) in Enseignement spécialisé "Apprentissage artificiel" for second year students at Ecole des Mines.

Mastère: S. Arlot and F. Bach, "Statistical learning", 24h, Mastère M2, Université Paris-Sud, France.

Mastère: G. Obozinski, N. Le Roux, Introduction à l'apprentissage machine appliqué aux neurosciences et à la cognition Co-teaching (6h) in the course of the Master Recherche en Sciences Cognitives co-habilitated by EHESS, ENS and Université Paris Descartes.

Mastère: F. Bach, G. Obozibski, Introduction aux modèles graphiques (30h), Master MVA (Ecole Normale Supérieure de Cachan).

Doctorat: S. Arlot, "Model selection via penalization, resampling and cross-validation, with application to change-point detection", 6h, Université de Cergy.

Doctorat: G. Obozinski, Probabilistic graphical models for Information Retrieval, in the Russian Summer School in Information Retrieval (RuSSIR 2012), Yaroslavl, Russia.

Doctorat: F. Bach, International Computer Vision Summer School 2012, 3h, Sicily.

Doctorat: F. Bach, Summer school on Visual Recognition and Machine Learning, 3h, Grenoble.

Doctorat: F. Bach: Machine learning summer school (MLSS), 3h, Kyoto, Japan.

PhD : Toby Hocking, “Learning algorithms and statistical software, with applications to bioinformatics”, ENS Cachan, November 20, 2012, Advisors: F. Bach, J.-P. Vert (Ecole des Mines de Paris - Institut Curie).

PhD: Augustin Lefèvre, “Dictionary learning methods for single-channel audio source separation”, ENS Cachan, October 3, 2012, Advisors: F. Bach and C. Févotte (Telecom Paristech).

PhD: Armand Joulin, “Convex optimization for co-segmentation”, ENS Cachan, December 17, 2012, Advisors: F. Bach and J. Ponce (Willow project-team).

S. Arlot, "Optimal model selection with V-fold cross-validation: how should V be chosen?", World Congress in Probability and Statistics 2012, Istanbul.

S. Arlot, "Resampling-based estimation of the accuracy of satellite ephemerides", Inaugural Conference of the Laboratory Fibonacci, Scuola Normale Superiore di Pisa, 2012.

S. Arlot, "Choix de V pour la sélection de modèles par validation croisée V-fold en estimation de densité", Séminaire parisien de statistique, IHP, Paris, 2012.

S. Arlot, "Calibration automatique d'estimateurs linéaires à l'aide de pénalités minimales, application à la régression multi-tâches.", Séminaire de Statistique de l'IMT, Toulouse, 2012.

F. Bach, International Conference on Pattern Recognition Applications and Methods, Faro, Portugal, 2012 (keynote speaker).

F. Bach, Rank Prize Symposium (invited talk), Lake District, England, 2012.

F. Bach, University of Cambridge (two seminars), 2012.

F. Bach, Schlumberger workshop on Mathematical Models of Sound Analysis, IHES (invited talk), 2012.

F. Bach, Joint Pattern Recognition Symposium of the German Association for Pattern Recognition (DAGM) (invited talk), Graz, Austria, 2012.

F. Bach, International Workshop on Machine Learning for Signal Processing (plenary lecture), Santander, Spain, 2012.

F. Bach, Seminar Max-Planck Institute, Tübingen, October 2012.

E. Grave, Laboratoire d'Informatique de Paris 6, Université Pierre et Marie Curie (Seminar), 2012.

S. Lacoste-Julien, "Harnessing the structure of data for discriminative machine learning", Colloque du Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montréal, Canada, February 2012

S. Lacoste-Julien, "Structured Alignment Methods in Machine Learning", SMT seminar at the LIMSI, Orsay, France, July 2012

S. Lacoste-Julien, "Frank-Wolfe optimization insights in machine learning", machine learning seminar, University of Toronto, Toronto, Canada, August 2012.

S. Lacoste-Julien, "Frank-Wolfe optimization insights in machine learning", Machine Learning Group seminar, University of Cambridge, Cambridge, UK, August 2012.

S. Lacoste-Julien, "Harnessing the structure of data in machine learning", invited talk, Department of Engineering Science, University of Oxford, Oxford, UK, September 2012.

S. Lacoste-Julien, "Frank-Wolfe optimization insights in machine learning", invited talk, Stanford AI Lab, Stanford University, Stanford, USA, December 2012.

G. Obozinski, Swiss Statistical Seminar, Bern, Switzerland, April 2012.

G. Obozinski, Séminaire de Statistiques, Université de Pennsylvanie, Philadelphia, PA, USA, May 2012.

G. Obozinski, Séminaire de Statistiques, Université Paris 11, May 2012.

G. Obozinski, Journées de Statistiques (Conférence annuelle de la Société Francaise de Statistiques), Université Libre de Bruxelles, Belgium, May 2012.

G. Obozinski, Congrèss mondial de Probabilités et Statistiques, Istanbul, Turkey, July 2012.

G. Obozinski, séminaire du CEREMADE, Université Paris-Dauphine, December 2012.

M. Schmidt, NAIS Workshop on Advances in Large-Scale Optimization, Edinburgh, May 24-25, 2012.

M. Schmidt, International Symposium on Mathematical Programming, Berlin, August 19-24, 2012.

M. Schmidt, University of British Columbia, "Linearly-Convergent Stochastic-Gradient Methods", Seminar, December 10, 2012.

M. Schmidt, - Simon Fraser University, "Opening up the black box: Faster methods for non-smooth and big-data optimization", Seminar, December 11, 2012.

M. Schmidt, NSERC Postdoctoral Fellowship (January 2012 - December 2013).

S. Lacoste-Julien, Research in Paris fellowship 2011-2012.

F. Bach: Inria young researcher prize, 2012.

R. Jenatton: Thesis prize from Fondation Hadamard, 2012.

R. Jenatton: Thesis prize from AFIA, accessit, 2012.

Participation to Inria-Rocquencourt “Fête de la Science”, 2012.