Machine learning is a recent scientific domain, positioned between applied mathematics, statistics and computer science. Its goals are the optimization, control, and modelisation of complex systems from examples. It applies to data from numerous engineering and scientific fields (e.g., vision, bioinformatics, neuroscience, audio processing, text processing, economy, finance, etc.), the ultimate goal being to derive general theories and algorithms allowing advances in each of these domains. Machine learning is characterized by the high quality and quantity of the exchanges between theory, algorithms and applications: interesting theoretical problems almost always emerge from applications, while theoretical analysis allows the understanding of why and when popular or successful algorithms do or do not work, and leads to proposing significant improvements.

Our academic positioning is exactly at the intersection between these three aspects—algorithms, theory and applications—and our main research goal is to make the link between theory and algorithms, and between algorithms and high-impact applications in various engineering and scientific fields, in particular computer vision, bioinformatics, audio processing, text processing and neuro-imaging.

Machine learning is now a vast field of research and the team focuses on the following aspects: supervised learning (kernel methods, calibration), unsupervised learning (matrix factorization, statistical tests), parsimony (structured sparsity, theory and algorithms), and optimization (convex optimization, bandit learning). These four research axes are strongly interdependent, and the interplay between them is key to successful practical applications.

This part of our research focuses on methods where, given a set of examples of input/output pairs, the goal is to predict the output for a new input, with research on kernel methods, calibration methods, and multi-task learning.

We focus here on methods where no output is given and the goal is to find structure of certain known types (e.g., discrete or low-dimensional) in the data, with a focus on matrix factorization, statistical tests, dimension reduction, and semi-supervised learning.

The concept of parsimony is central to many areas of science. In the context of statistical machine learning, this takes the form of variable or feature selection. The team focuses primarily on structured sparsity, with theoretical and algorithmic contributions (this is the main topic of the ERC starting investigator grant awarded to F. Bach).

Optimization in all its forms is central to machine learning, as many of its theoretical frameworks are based at least in part on empirical risk minimization. The team focuses primarily on convex and bandit optimization, with a particular focus on large-scale optimization.

Machine learning research can be conducted from two main perspectives: the first one, which has been dominant in the last 30 years, is to design learning algorithms and theories which are as generic as possible, the goal being to make as few assumptions as possible regarding the problems to be solved and to let data speak for themselves. This has led to many interesting methodological developments and successful applications. However, we believe that this strategy has reached its limit for many application domains, such as computer vision, bioinformatics, neuro-imaging, text and audio processing, which leads to the second perspective our team is built on: Research in machine learning theory and algorithms should be driven by interdisciplinary collaborations, so that specific prior knowledge may be properly introduced into the learning process, in particular with the following fields:

Computer vision: object recognition, object detection, image segmentation, image/video processing, computational photography. In collaboration with the Willow project-team.

Bioinformatics: cancer diagnosis, protein function prediction, virtual screening. In collaboration with Institut Curie.

Text processing: document collection modeling, language models.

Audio processing: source separation, speech/music processing. In collaboration with Telecom Paristech.

Neuro-imaging: brain-computer interface (fMRI, EEG, MEG). In collaboration with the Parietal project-team.

F. Bach has served as a program co-chair for the International Conference in Machine Learning (ICML) held in Lille, France, 2015.

The DICA package contains Matlab and C++ (via Matlab mex files) implementations of estimation in the LDA and closely related DICA models .

The implementation consists of two parts. One part contains the efficient implementation for construction of the moment/cumulant tensors, while the other part contains implementations of several so called joint diagonalization type algorithms used for matching the tensors. Any tensor type (see below) can be arbitrarily combined with one of the diagonalization algorithms (see below) leading, in total, to 6 algorithms.

Two types of tensors are considered: (a) the LDA moments and (b) the DICA cumulants. The diagonalization algorithms include: (a) the orthogonal joint diagonalization algorithm based on iterative Jacobi rotations, (b) the spectral algorithm based on two eigen decompositions, and (c) the tensor power method.

Contact: Anastasia Podosinnikova

This is the code to reproduce all the experiments in the NIPS 2015 paper: "On the Global Linear Convergence of Frank-Wolfe Optimization Variants" by Simon Lacoste-Julien and Martin Jaggi , which covers the global linear convergence rate of Frank-Wolfe optimization variants for problems described as in Eq. (1) in the paper. It contains the implementation of Frank-Wolfe, away-steps Frank-Wolfe and pairwise Frank-Wolfe on two applications.

Contact: Simon Lacoste-Julien

Contact: Anton Osokin

Collaboration with Martin Jaggi (ETH Zurich).

The Frank-Wolfe (FW) optimization algorithm has lately re-gained popularity thanks in particular to its ability to nicely handle the structured constraints appearing in machine learning applications. However, its convergence rate is known to be slow (sublinear) when the solution lies at the boundary. A simple less-known fix is to add the possibility to take 'away steps' during optimization, an operation that importantly does not require a feasibility oracle. In this paper , we highlight and clarify several variants of the Frank-Wolfe optimization algorithm that have been successfully applied in practice: away-steps FW, pairwise FW, fully-corrective FW and Wolfe's minimum norm point algorithm, and prove for the first time that they all enjoy global linear convergence, under a weaker condition than strong convexity of the objective. The constant in the convergence rate has an elegant interpretation as the product of the (classical) condition number of the function with a novel geometric quantity that plays the role of a `condition number' of the constraint set. We provide pointers to where these algorithms have made a difference in practice, in particular with the flow polytope, the marginal polytope and the base polytope for submodular optimization.

Collaboration with Rahul G. Krishnan [correspondent] and David Sontag (NYU).

Collaboration with Fredrik Lindsten (University of Cambridge).

Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a reproducing kernel Hilbert space (RKHS) with a potentially faster rate of convergence than Monte Carlo integration (and “kernel herding” was shown to be a special case of this procedure). In this paper , we propose to replace the random sampling step in a particle filter by Frank-Wolfe optimization. By optimizing the position of the particles, we can obtain better accuracy than random or quasi-Monte Carlo sampling. In applications where the evaluation of the emission probabilities is expensive (such as in robot localization), the additional computational cost to generate the particles through optimization can be justified. Experiments on standard synthetic examples as well as on a robot localization task indicate indeed an improvement of accuracy over random and quasi-Monte Carlo sampling.

Collaboration with Thomas Hofmann [correspondent], Aurelien Lucchi and Brian McWilliams (ETH Zurich).

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in memory. Therefore speed-ups relative to SGD may need a minimal number of epochs in order to materialize. This paper investigates algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points, which offers advantages in the transient optimization phase. As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms. We provide experimental results supporting our theory.

Collaboration with Alexander Novikov, Dmitry Podoprikhin and Dmitry Vetrov.

Deep neural networks currently demonstrate state-of-the-art performance in several domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper , we convert the dense weight matrices of the fully-connected layers to the Tensor Train format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times.

Collaboration with Tuan-Hung Vu [correspondent] and Ivan Laptev from the Willow project-team.

Person detection is a key problem for many computer vision tasks. While face detection has reached maturity, detecting people under a full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. In this work , we focus on detecting human heads in natural scenes. Starting from the recent local R-CNN object detector, we extend it with two types of contextual cues. First, we leverage person-scene relations and propose a Global CNN model trained to predict positions and scales of heads directly from the full image. Second, we explicitly model pairwise relations among objects and train a Pairwise CNN model using a structured-output surrogate loss. The Local, Global and Pairwise models are combined into a joint CNN framework. To train and test our full model, we introduce a large dataset composed of 369,846 human heads annotated in 224,740 movie frames. We evaluate our method and demonstrate improvements of person head detection against several recent baselines in three datasets. We also show improvements of the detection speed provided by our model.

Collaboration with Piotr Bojanowski, Josef Sivic and Ivan Laptev from the Willow project-team, and Nishant Agrawal.

Collaboration with Visesh Chari, Ivan Laptev [correspondent] and Josef Sivic from the Willow project-team.

Multi-object tracking has been recently approached with the min-cost network flow optimization techniques. Such methods simultaneously resolve multiple object tracks in a video and enable modeling of dependencies among tracks. Min-cost network flow methods also fit well within the “tracking-by-detection” paradigm where object trajectories are obtained by connecting per-frame outputs of an object detector. Object detectors, however, often fail due to occlusions and clutter in the video. To cope with such situations, we propose in an approach that regularizes the tracker by adding second order costs to the min-cost network flow framework. While solving such a problem with integer variables is NP-hard, we present a convex relaxation with an efficient rounding heuristic which empirically gives certificates of small suboptimality. Results are shown on real-world video sequences and demonstrate that the new constraints help selecting longer and more accurate tracks improving over the baseline tracking-by-detection method.

Collaboration with Roman Shapovalov, Dmitry Vetrov and Pushmeet Kohli.

Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training. In this paper , we try to overcome this difficulty by presenting a multi-utility learning framework for structured prediction that can learn from training instances with different forms of supervision. We propose a unified technique for inferring the loss functions most suitable for quantifying the consistency of solutions with the given weak annotation. We demonstrate the effectiveness of our framework on the challenging semantic image segmentation problem for which a wide variety of annotations can be used. For instance, the popular training datasets for semantic segmentation are composed of images with hard-to-generate full pixel labellings, as well as images with easy-to-obtain weak annotations, such as bounding boxes around objects, or image-level labels that specify which object categories are present in an image. Experimental evaluation shows that the use of annotation-specific loss functions dramatically improves segmentation accuracy compared to the baseline system where only one type of weak annotation is used.

Collaboration with Alvaro Barbero, Stefanie Jegelka and Suvrit Sra.

Energy minimization has been an intensely studied core problem in computer vision. With growing image sizes (2D and 3D), it is now highly desirable to run energy minimization algorithms in parallel. But many existing algorithms, in particular, some efficient combinatorial algorithms, are difficult to parallelize. By exploiting results from convex and submodular theory, we reformulate in the quadratic energy minimization problem as a total variation denoising problem, which, when viewed geometrically, enables the use of projection and reflection based convex methods. The resulting min-cut algorithm (and code) is conceptually very simple, and solves a sequence of TV denoising problems. We perform an extensive empirical evaluation comparing state-of-the-art combinatorial algorithms and convex optimization techniques. On small problems the iterative convex methods match the combinatorial max-flow algorithms, while on larger problems they offer other flexibility and important gains: (a) their memory footprint is small;
(b) their straightforward parallelizability fits multi-core platforms; (c) they can easily be warm-started; and (d) they quickly reach approximately good solutions, thereby enabling faster “inexact” solutions. A key consequence of our approach based on submodularity and convexity is that it is allows to combine *any arbitrary combinatorial or convex methods as subroutines*, which allows one to obtain hybrid combinatorial and convex optimization algorithms that benefit from the strengths of both.

Collaboration with the Indian Institute of Science, Bangalore, India.

Recent literature suggests that embedding a graph on an unit sphere leads to better generalization for graph transduction. However, the choice of optimal embedding and an efficient algorithm to compute the same remains open. In this paper , we show that orthonormal representations, a class of unit-sphere graph embeddings are PAC learnable. Existing PAC-based analysis do not apply as the VC dimension of the function class is infinite. We propose an alternative PAC-based bound, which do not depend on the VC dimension of the underlying function class, but is related to the famous Lovasz function. The main contribution of the paper is SPORE, a SPectral regularized ORthonormal Embedding for graph transduction, derived from the PAC bound. SPORE is posed as a non-smooth convex function over an elliptope. These problems are usually solved as semi-definite programs (SDPs) with time complexity

Collaboration with Hugo Raguet.

Collaboration with Afonso S. Bandeira and Amit Singer.

Many maximum likelihood estimation problems are, in general, intractable optimization problems. As a result, it is common to approximate the maximum likelihood estimator (MLE) using convex relaxations. Semidefinite relaxations are among the most popular. Sometimes, the relaxations turn out to be tight. In this paper , we study such a phenomenon. The angular synchronization problem consists in estimating a collection of n phases, given noisy measurements of some of the pairwise relative phases. The MLE for the angular synchronization problem is the solution of a (hard) non-bipartite Grothendieck problem over the complex numbers. It is known that its semidefinite relaxation enjoys worst-case approximation guarantees. In this paper, we consider a stochastic model on the input of that semidefinite relaxation. We assume there is a planted signal (corresponding to a ground truth set of phases) and the measurements are corrupted with random noise. Even though the MLE does not coincide with the planted signal, we show that the relaxation is, with high probability, tight. This holds even for high levels of noise. This analysis explains, for the interesting case of angular synchronization, a phenomenon which has been observed without explanation in many other settings. Namely, the fact that even when exact recovery of the ground truth is impossible, semidefinite relaxations for the MLE tend to be tight (in favorable noise regimes).

Collaboration with Matthew H. Seaberg and Joshua J. Turner.

Coherent diffractive imaging (CDI) provides new opportunities for high resolution X-ray imaging with simultaneous amplitude and phase contrast. Extensions to CDI broaden the scope of the technique for use in a wide variety of experimental geometries and physical systems. Here , we experimentally demonstrate a new extension to CDI that encodes additional information through the use of a series of randomly coded masks. The information gained from the few additional diffraction measurements removes the need for typical object-domain constraints; the algorithm uses prior information about the masks instead. The experiment is performed using a laser diode at 532.2 nm, enabling rapid prototyping for future X-ray synchrotron and even free electron laser experiments. Diffraction patterns are collected with up to 15 different masks placed between a CCD detector and a single sample. Phase retrieval is performed using a convex relaxation routine known as “PhaseCut” followed by a variation on Fienup's input-output algorithm. The reconstruction quality is judged via calculation of phase retrieval transfer functions as well as by an object-space comparison between reconstructions and a lens-based image of the sample. The results of this analysis indicate that with enough masks (in this case 3 or 4) the diffraction phases converge reliably, implying stability and uniqueness of the retrieved solution.

Renegar's condition number is a data-driven computational complexity measure for convex programs, generalizing classical condition numbers in linear systems. In , we provide evidence that for a broad class of compressed sensing problems, the worst case value of this algorithmic complexity measure taken over all signals matches the restricted eigenvalue of the observation matrix, which controls compressed sensing performance. This means that, in these problems, a single parameter directly controls computational complexity and recovery performance.

Collaboration with Rodolphe Jenatton.

Seriation seeks to reconstruct a linear order between variables using unsorted similarity information. It has direct applications in archeology and shotgun gene sequencing for example. In , we prove the equivalence between the seriation and the combinatorial 2-sum problem (a quadratic minimization problem over permutations) over a class of similarity matrices. The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we produce a convex relaxation for the 2-sum problem to improve the robustness of solutions in a noisy setting. This relaxation also allows us to impose additional structural constraints on the solution, to solve semi-supervised seriation problems. We present numerical experiments on archeological data, Markov chains and gene sequences.

Collaboration with Irène Waldspurger and Stéphane Mallat.

Phase retrieval seeks to recover a signal

Collaboration with Matthieu Lerasle.

Collaboration with Joon Kwon.

Collaboration with Philippe Rigollet, Sylvain Chassang and Erik Snowberg.

Motivated by practical applications, chiefly clinical trials, we study in the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Collaboration with Jonathan Weed and Philippe Rigollet.

Microsoft Research: “Structured Large-Scale Machine Learning”. Machine learning is now ubiquitous
in industry, science, engineering, and personal life. While early successes were obtained by
applying off-the-shelf techniques, there are two main challenges faced by machine learning in the “big data” era: structure and scale. The project proposes to explore three axes, from theoretical, algorithmic
and practical perspectives: (1) large-scale convex optimization, (2) large-scale combinatorial
optimization and (3) sequential decision making for structured data. The project involves two Inria
sites (Paris and Grenoble) and four MSR sites (Cambridge, New England, Redmond,
New York). Project website: http://

A. d'Aspremont, IdR EMMA, Euroclear – Institut Louis Bachelier.

A. d'Aspremont, IBM Faculty Award.

A. d'Aspremont and F. Bach, IdR AXA “Machine Learning”, chaire Havas-ILB, Economie des Nouvelles Données.

A. d'Aspremont and F. Bach, Comité de pilotage, chaire Havas – Dauphine “Economie des nouvelles données”.

S. Lacoste-Julien (with J. Sivic and I. Laptev in Willow project-team), Google Research Award “Structured Learning from Video and Natural Language”.

Titre: Statistical calibration

Coordinator: University Paris Dauphine

Leader: Vincent Rivoirard

Other members: 34 members, mostly among CEREMADE (Paris Dauphine), Laboratoire Jean-Alexandre Dieudonné (Université de Nice) and Laboratoire de Mathématiques de l'Université Paris Sud

Instrument: ANR Blanc

Duration: Jan 2012 - Dec 2015

Total funding: 240 000 euros

Titre: BeFast

Coordinator: University Lille 1

Leader: Alain Celisse

Other members: Tristan Mary-Huard, Guillem Rigaill, Guillemette Marot, and Julien Chiquet.

Instrument: PEPS

Duration: Mar 2015 – Dec 2015

Total funding: 9 000 euros

Type: FP7

Defi: NC

Instrument: ERC Starting Grant

Duration: May 2011 - May 2016

Coordinator: A. d’Aspremont (CNRS)

Abstract: Interior point algorithms and a dramatic growth in computing power have revolutionized optimization in the last two decades. Highly nonlinear problems which were previously thought intractable are now routinely solved at reasonable scales. Semidefinite programs (i.e. linear programs on the cone of positive semidefinite matrices) are a perfect example of this trend: reasonably large, highly nonlinear but convex eigenvalue optimization problems are now solved efficiently by reliable numerical packages. This in turn means that a wide array of new applications for semidefinite programming have been discovered, mimicking the early development of linear programming. To cite only a few examples, semidefinite programs have been used to solve collaborative filtering problems (e.g. make personalized movie recommendations), approximate the solution of combinatorial programs, optimize the mixing rate of Markov chains over networks, infer dependence patterns from multivariate time series or produce optimal kernels in classification problems. These new applications also come with radically different algorithmic requirements. While interior point methods solve relatively small problems with a high precision, most recent applications of semidefinite programming in statistical learning for example form very large-scale problems with comparatively low precision targets, programs for which current algorithms cannot form even a single iteration. This proposal seeks to break this limit on problem size by deriving reliable first-order algorithms for solving large-scale semidefinite programs with a significantly lower cost per iteration, using for example subsampling techniques to considerably reduce the cost of forming gradients. Beyond these algorithmic challenges, the proposed research will focus heavily on applications of convex programming to statistical learning and signal processing theory where optimization and duality results quantify the statistical performance of coding or variable selection algorithms for example. Finally, another central goal of this work will be to produce efficient, customized algorithms for some key problems arising in machine learning and statistics.

Title: Sparse Representations and Compressed Sensing Training Network

Type: FP7

Defi: NC

Instrument: Initial Training Network

Duration: October 2014 to October 2018

Coordinator: Mark Plumbley (University of Surrey)

Inria contact: Francis Bach

Abstract: The SpaRTaN Initial Training Network will train a new generation of interdisciplinary researchers in sparse representations and compressed sensing, contributing to Europe’s leading role in scientific innovation.

By bringing together leading academic and industry groups with expertise in sparse representations, compressed sensing, machine learning and optimisation, and with an interest in applications such as hyperspectral imaging, audio signal processing and video analytics, this project will create an interdisciplinary, trans-national and inter-sectorial training network to enhance mobility and training of researchers in this area.

SpaRTaN is funded under the FP7-PEOPLE-2013-ITN call and is part of the Marie Curie Actions — Initial Training Networks (ITN) funding scheme: Project number - 607290

Title: Machine Sensing Training Network

Type: H2020

Instrument: Initial Training Network

Duration: January 2015 - January 2019

Coordinator: Mark Plumbley (University of Surrey)

Inria contact: Francis Bach

Abstract: The aim of this Innovative Training Network is to train a new generation of creative, entrepreneurial and innovative early stage researchers (ESRs) in the research area of measurement and estimation of signals using knowledge or data about the underlying structure.

We will develop new robust and efficient Machine Sensing theory and algorithms, together methods for a wide range of signals, including: advanced brain imaging; inverse imaging problems; audio and music signals; and non-traditional signals such as signals on graphs. We will apply these methods to real-world problems, through work with non-Academic partners, and disseminate the results of this research to a wide range of academic and non-academic audiences, including through publications, data, software and public engagement events.

MacSeNet is funded under the H2020-MSCA-ITN-2014 call and is part of the Marie Sklodowska-Curie Actions — Innovative Training Networks (ITN) funding scheme.

Visit from Chiranjib Bhattacharyya, Indian Institute of Science, Bangalore, May 2014.

Visit from Raman Sankaran, Indian Institute of Science, Bangalore, January 2014.

F. Bach, program co-chair for International Conference on Machine Learning (ICML), 2015.

A. d'Aspremont, workshop organizer, “Optimisation pour l’apprentissage statistique”, École de Physique des Houches, January 2015.

V. Perchey, local chair for the main conference in learning theory, the “Conference On Learning Theory – COLT'15”, July 2015.

V. Perchey, workshop organizer, “Challenges in Optimization for Data Science”, July 2015.

V. Perchey, organizer of the conference “French Symposium on Games - Theory and Applications”, May 2015.

S. Lacoste-Julien, International Conference on Machine Learning (ICML), 2015.

A. d'Aspremont, SIAM Journal on Optimization, Associate Editor.

F. Bach, Journal of Machine Learning Research, Action Editor.

F. Bach, Information and Inference, Associate Editor.

F. Bach, SIAM Journal on Imaging Sciences, Associate Editor.

A. d’Aspremont is on the scientific committee for Programme Gaspard Monge pour l’Optimisation, Fondation mathématique Jacques Hadamard.

S. Lacoste-Julien received a NIPS 2015 Outstanding Reviewer Award.

S. Arlot, “V-fold selection of kernel estimators", EMS 2015: European Meeting of Statisticians, session `Recent advances in resampling methods' (Amsterdam, July 6, 2015).

S. Arlot, “Cross-validation for estimator selection", Workshop “Information Based Complexity and Model Selection" (IHP, Paris, April 9, 2015).

S. Arlot, “Comparaison de procédures de validation croisée (V-fold)", Conférence "Calibration statistique" (Université de Nice - Sophia Antipolis, February 20, 2015).

S. Arlot, “Cross-validation for estimator selection", SMPGD 2015: Statistical Methods for Post Genomic Data (Ludwig-Maximilians University, February 13, 2015).

S. Arlot, “Sélection d'estimateurs par validation croisée", Colloquium du MAP5 (Université Paris Descartes, November 13, 2015).

S. Arlot, “Analyse du biais de forêts purement aléatoires", Séminaire de Probabilités et Statistiques (Lieu, Laboratoire JA Dieudonné, Université de Nice - Sophia Antipolis, January 8, 2015).

A. d'Aspremont, “Ranking from Pairwise Comparisons using Seriation”:

Séminaire du Collège de France, June 2015.

International Symposium on Mathematical Programming, July 2015.

STATLEARN 2015.

Fields Institute workshop on big data, Toronto, Feb. 2015.

A. d'Aspremont, “Seriation, DNA Sequencing and Nanopores”, Datalead 2015.

A. d'Aspremont, “Optimisation et apprentissage”, Conférence SCOR - Institut des actuaires, December 2015.

F. Bach, Invited presentation at Workshop on Data-driven Algorithmics, Harvard, September 2015.

F. Bach, Invited tutorial at Allerton Conference, Urbana-Champaign, 2015.

F. Bach, Invited seminar at Ecole Polytechnique Federale de Lausanne, October 2015.

D. Garreau, “Kernel change-point detection", 45th Ecole d'été de Saint-Flour (Saint-Flour, July 7-17, 2015).

S. Lacoste-Julien, “Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering", invited talk at the Bayes in Paris Seminar at ENSAE, Paris, France, March 2015.

S. Lacoste-Julien, “SAGA: a fast incremental gradient method", invited talk in the Nonsmooth Optimization cluster at ISMP 2015 (22nd International Symposium on Mathematical Programming), Pittsburgh, PA, USA, July 2015.

S. Lacoste-Julien, “Frank-Wolfe Optimization for Structured Machine Learning”:

McGill University, Montreal, Canada, December 2015.

Université de Montréal, Montreal, Canada, December 2015.

Master : S. Arlot and F. Bach, “Statistical learning", 24h, M2, Université Paris-Sud.

Master : A. d'Aspremont is co-director of the Master MASH (Mathématiques, Apprentissage et Sciences Humaines) at Paris Sciences et Lettres.

Master : A. d'Aspremont, “Optimisation et apprentissage”, 21h, M2 (MVA), ENS Cachan.

Master : A. d'Aspremont, “Optimisation convexe”, 21h, M2, Ecole Normale Supérieure.

Master : S. Lacoste-Julien and F. Bach, “Apprentissage statistique”, 35h, M1, Ecole Normale Supérieure.

Master : S. Lacoste-Julien and F. Bach (together with G. Obozinski), 30h, M2 (MVA), ENS Cachan.

Master : S. Lacoste-Julien, “Projets informatiques”, 20h, M2 (MASH), Université de Paris-Dauphine.

PhD : Fabian Pedregosa, “Feature extraction and supervised learning on fMRI: from practice to theory”, UPMC, February 20th, 2015, F. Bach, co-advised with A. Gramfort (Telecom).

PhD : Rémi Lajugie, “Structured prediction for sequential data”, UPMC, September 18th, 2015, S. Arlot and F. Bach.

PhD : Fajwel Fogel, “Convex and spectral relaxations for phase retrieval, seriation and ranking”, Ecole Polytechnique, November 18th, 2015, A. d'Aspremont and F. Bach.

PhD in progress: Thomas Schatz, September 2012, F. Bach, co-advised with E. Dupoux (ENS, cognitive sciences).

PhD in progress: Sesh Kumar, September 2013, F. Bach.

PhD in progress: Rafael Rezende, September 2013, F. Bach, co-advised with J. Ponce.

PhD in progress: Anastasia Podosinnikova, December 2013, F. Bach and S. Lacoste-Julien.

PhD in progress: Christophe Dupuy, January 2014, F. Bach, co-advised with C. Diot (Technicolor).

PhD in progress: Jean-Baptiste Alayrac, September 2014, S. Lacoste-Julien, co-advised with J. Sivic and I. Laptev.

PhD in progress: Aymeric Dieuleveut, September 2014, F. Bach.

PhD in progress: Nicolas Flammarion, September 2014, A. d’Aspremont and F. Bach.

PhD in progress: Damien Garreau, September 2014, S. Arlot (co-advised with G. Biau).

PhD in progress: Vincent Roulet, October 2014, A. d’Aspremont.

PhD in progress: Anaël Bonneton, December 2014, F. Bach, located in Agence nationale de la sécurité des systèmes d'information (ANSSI).

PhD in progress: Dmitry Babichev, September 2015, F. Bach, co-advised with A. Judistky (Univ. Grenoble).

PhD in progress: Rémi Leblond, September 2015, S. Lacoste-Julien.

PhD in progress: Antoine Recanati, September 2015, A. d'Aspremont.

PhD in progress: Damien Scieur, September 2015, A. d'Aspremont and F. Bach.

PhD in progress: Tatiana Shpakova, September 2015, F. Bach.

S. Arlot, member of the PhD Committee for Maud Thomas at Université Paris Diderot on July 2, 2015.

F. Bach, member of the HDR committee of Jeremie Mary (Univ. Lille).

S. Lacoste-Julien, general public talk “Apprentissage automatique et big data", at La Maison des Sciences, Châtenay-Malabry, France, September 2015.