Non-parametric Models for Non-negative Functions

SIERRA Statistical Machine Learning and Parsimony

Optimization, machine learning and statistical methods

Applied Mathematics, Computation and Simulation

http://team.inria.fr/sierra Département d'Informatique de l'Ecole Normale Supérieure CNRS, Ecole normale supérieure de Paris Creation of the Team: 2011 January 01, updated into Project-Team: 2012 January 01 Project-Team A3.4. - Machine learning and statistics A5.4. - Computer vision A6.2. - Scientific computing, Numerical Analysis & Optimization A7.1. - Algorithms A8.2. - Optimization A9.2. - Machine learning B9.5.6. - Data science Paris Francis Bach Chercheur Team leader, Inria, Senior Researcher oui Pierre Gaillard Chercheur Inria, Researcher, until Aug 2020 Alessandro Rudi Chercheur Inria, Researcher Umut Simsekli Chercheur Inria, Researcher, from Nov 2020 Adrien Taylor Chercheur Inria, Starting Research Position Alexandre d'Aspremont Chercheur CNRS, Senior Researcher Martin Arjovsky PostDoc Inria Alberto Bietti PostDoc Inria, until Aug 2020 Seyed Daneshmand PostDoc Inria, from Aug 2020 Remy Degenne PostDoc Inria, until Sep 2020 Ziad Kobeissi PostDoc Institut Louis Bachelier, from Oct 2020 Pierre-Yves Masse PostDoc Université technique de Prague - Tchéquie, until Mar 2020 Boris Muzellec PostDoc Inria, from Nov 2020 Yifan Sun PostDoc École Normale Supérieure de Paris, until Aug 2020 Mathieu Barre PhD École Normale Supérieure de Paris Eloise Berthier PhD DGA Raphael Berthier PhD Inria Margaux Bregere PhD EDF, until Oct 2020 Vivien Cabannes PhD Inria Alexandre Defossez PhD Facebook, until Jun 2020 Radu Alexandru Dragomir PhD École polytechnique, Gautier Izacard PhD CNRS, from Feb 2020 Remi Jezequel PhD École Normale Supérieure de Paris Thomas Kerdreux PhD École polytechnique, PhD completed in Sept. 2020 Hans Kersting PhD Inria, from Oct 2020 Marc Lambert PhD DGA, from Sep 2020 Ulysse Marteau-Ferey PhD Inria Gregoire Mialon PhD Inria, Alex Nowak Vila PhD École Normale Supérieure de Paris Loucas Pillaud Vivien PhD Ministère de l'Ecologie, de l'Energie, du Développement durable et de la Mer, until Aug 2020 Manon Romain PhD CNRS, from Sep 2020 Loïc Estève Technique Inria, Engineer, until Feb 2020 Gautier Izacard Technique CNRS, Engineer, until Jan 2020 Stanislas Bénéteau Stagiaire Ecole normale supérieure Paris-Saclay, from Apr 2020 until Aug 2020 Celine Moucer Stagiaire École polytechnique, from Apr 2020 until Aug 2020 Quentin Rebjock Stagiaire Inria, until Mar 2020 Helene Bessin Rousseau Assistant Inria, until Jun 2020 Helene Milome Assistant Inria Scheherazade Rouag Assistant Inria, from Nov 2020 Anant Raj Visiteur Institut Max-Planck, until Mar 2020 Manon Romain Visiteur CNRS, from Jun 2020 until Aug 2020 Aadirupa Saha Visiteur Institut Indien des Sciences, until Jan 2020 Overall objectives Statement

Machine learning is a recent scientific domain, positioned between applied mathematics, statistics and computer science. Its goals are the optimization, control, and modelisation of complex systems from examples. It applies to data from numerous engineering and scientific fields (e.g., vision, bioinformatics, neuroscience, audio processing, text processing, economy, finance, etc.), the ultimate goal being to derive general theories and algorithms allowing advances in each of these domains. Machine learning is characterized by the high quality and quantity of the exchanges between theory, algorithms and applications: interesting theoretical problems almost always emerge from applications, while theoretical analysis allows the understanding of why and when popular or successful algorithms do or do not work, and leads to proposing significant improvements.

Our academic positioning is exactly at the intersection between these three aspects—algorithms, theory and applications—and our main research goal is to make the link between theory and algorithms, and between algorithms and high-impact applications in various engineering and scientific fields, in particular computer vision, bioinformatics, audio processing, text processing and neuro-imaging.

Machine learning is now a vast field of research and the team focuses on the following aspects: supervised learning (kernel methods, calibration), unsupervised learning (matrix factorization, statistical tests), parsimony (structured sparsity, theory and algorithms), and optimization (convex optimization, bandit learning). These four research axes are strongly interdependent, and the interplay between them is key to successful practical applications.

Research program Supervised Learning

This part of our research focuses on methods where, given a set of examples of input/output pairs, the goal is to predict the output for a new input, with research on kernel methods, calibration methods, and multi-task learning.

Unsupervised Learning

We focus here on methods where no output is given and the goal is to find structure of certain known types (e.g., discrete or low-dimensional) in the data, with a focus on matrix factorization, statistical tests, dimension reduction, and semi-supervised learning.

Parsimony

The concept of parsimony is central to many areas of science. In the context of statistical machine learning, this takes the form of variable or feature selection. The team focuses primarily on structured sparsity, with theoretical and algorithmic contributions.

Optimization

Optimization in all its forms is central to machine learning, as many of its theoretical frameworks are based at least in part on empirical risk minimization. The team focuses primarily on convex and bandit optimization, with a particular focus on large-scale optimization.

Application domains Applications for Machine Learning

Machine learning research can be conducted from two main perspectives: the first one, which has been dominant in the last 30 years, is to design learning algorithms and theories which are as generic as possible, the goal being to make as few assumptions as possible regarding the problems to be solved and to let data speak for themselves. This has led to many interesting methodological developments and successful applications. However, we believe that this strategy has reached its limit for many application domains, such as computer vision, bioinformatics, neuro-imaging, text and audio processing, which leads to the second perspective our team is built on: Research in machine learning theory and algorithms should be driven by interdisciplinary collaborations, so that specific prior knowledge may be properly introduced into the learning process, in particular with the following fields:

Computer vision: object recognition, object detection, image segmentation, image/video processing, computational photography. In collaboration with the Willow project-team.

Bioinformatics: cancer diagnosis, protein function prediction, virtual screening. In collaboration with Institut Curie.

Text processing: document collection modeling, language models.

Audio processing: source separation, speech/music processing.

Neuro-imaging: brain-computer interface (fMRI, EEG, MEG).

Highlights of the year

A. Rudi: Recipient of an ERC starting grant

F. Bach: Election at the French Academy of Sciences

F.P. Paty, A. d'Aspremont, M. Cuturi: AISTATS 2020 notable paper award

New results Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain nonHilbertian space of functions. In presence of hidden low-dimensional structures, the resulting margin is independent of the ambiant dimension, which leads to strong generalization bounds. In contrast, training only the output layer implicitly solves a kernel support vector machine, which a priori does not enjoy such an adaptivity. Our analysis of training is non-quantitative in terms of running time but we prove computational guarantees in simplified settings by showing equivalences with online mirror descent. Finally, numerical experiments suggest that our analysis describes well the practical behavior of two-layer neural networks with ReLU activations and confirm the statistical benefits of this implicit bias

Learning with Differentiable Perturbed Optimizers

Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g., sorting, picking closest neighbors, or shortest paths). Although these discrete decisions are easily computed, they break the back-propagation of computational graphs. In order to expand the scope of learning problems that can be solved in an end-to-end fashion, we propose a systematic method to transform optimizers into operations that are differentiable and never locally constant. Our approach relies on stochastically perturbed optimizers, and can be used readily together with existing solvers. Their derivatives can be evaluated efficiently, and smoothness tuned via the chosen noise amplitude. We also show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks. We demonstrate experimentally the performance of our approach on various tasks.

Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a preconditioned accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying uniform concentration of the Hessians over a bounded domain, which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime.

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random matrices, it is not surprising to find that the rank of the intermediate representations in unnormalized networks collapses quickly with depth. In this work we highlight the fact that batch normalization is an effective strategy to avoid rank collapse for both linear and ReLU networks. Leveraging tools from Markov chain theory, we derive a meaningful lower rank bound in deep linear networks. Empirically, we also demonstrate that this rank robustness generalizes to ReLU nets.Finally, we conduct an extensive set of experiments on real-world data sets, which confirm that rank stability is indeed a crucial condition for training modern-day deep neural architectures.

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = 〈 θ_{*}, Φ (U) 〉$ between the random output $Y$ and the random feature vector $Φ (U)$ , a potentially non-linear transformation of the inputs $U$ . We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $θ_{*}$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $θ_{*}$ and of the feature vectors $Φ (U)$ . We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit hypercube from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.

Consistent Structured Prediction with Max-Min Margin Markov Networks

Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks (M3N), or more generally structural SVMs. Unfortunately, these methods are statistically inconsistent when the relationship between inputs and labels is far from deterministic. We overcome such limitations by defining the learning problem in terms of a “max-min” margin formulation, naming the resulting method max-min margin Markov networks (M4N). We prove consistency and finite sample generalization bounds for M4N and provide an explicit algorithm to compute the estimator. The algorithm achieves a generalization error of $O (1 / \sqrt{n})$ for a total cost of $O (n)$ projection-oracle calls (which have at most the same cost as the max-oracle from M3N). Experiments on multi-class classification, ordinal regression, sequence prediction and ranking demonstrate the effectiveness of the proposed method.

Fast and Robust Stability Region Estimation for Nonlinear Dynamical Systems

A linear quadratic regulator can stabilize a nonlinear dynamical system with a local feedback controller around a linearization point, while minimizing a given performance criteria. An important practical problem is to estimate the region of attraction of such a controller, that is, the region around this point where the controller is certified to be valid. This is especially important in the context of highly nonlinear dynamical systems. In this paper, we propose two stability certificates that are fast to compute and robust when the first, or second derivatives of the system dynamics are bounded. Associated with an efficient oracle to compute these bounds, this provides a simple stability region estimation algorithm compared to classic approaches of the state of the art. We experimentally validate that it can be applied to both polynomial and non-polynomial systems of various dimensions, including standard robotic systems, for estimating region of attractions around equilibrium points, as well as for trajectory tracking.

Breaking the curse of dimensionality of Global Optimization of Non-convex functions

We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this project, we consider an approach that jointly models the function to approximate and finds a global minimum. This is done by using infinite sums of square smooth functions and has strong links with polynomial sum-of-squares hierarchies. Leveraging recent representation properties of reproducing kernel Hilbert spaces, the infinite-dimensional optimization problem can be solved by subsampling in time polynomial in the number of function evaluations, and with theoretical guarantees on the obtained minimum.

Given $n$ samples, the computational cost is $O (n^{3.5})$ in time, $O (n^{2})$ in space, and we achieve a convergence rate to the global optimum that is $O (n^{- m / d + 1 / 2 + 3 / d})$ where m is the degree of differentiability of the function and d the number of dimensions. The rate is nearly optimal in the case of Sobolev functions and more generally makes the proposed method particularly suitable for functions that have a large number of derivatives. Indeed, when m is in the order of d, the convergence rate to the global optimum does not suffer from the curse of dimensionality, which affects only the worst-case constants (that we track explicitly through the paper).

Efficient improper learning for online logistic regression

We considered the setting of online logistic regression with the objective of minimizing the regret with respect to the $ℓ_{2}$ -ball of radius $B$ . It was known (see [Hazan et al., 2014]) that any proper algorithm which had logarithmic regret in the number of samples (denoted n) necessarily suffered an exponential multiplicative constant in $B$ . In this work, we designed an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret. Indeed, [Foster et al., 2018] showed that the lower bound does not apply to improper algorithms and proposed a strategy based on exponential weights with prohibitive computational complexity. Our new algorithm based on regularized empirical risk minimization with surrogate losses satisfies a regret scaling as $O (B log (B n))$ with a per-round time-complexity of order $O (d^{2})$ .

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards

We considered the problem of sleeping bandits with stochastic action sets and adversarial rewards. In this setting, in contrast to most work in bandits, the actions may not be available at all times. For instance, some products might be out of stock in item recommendation. The best existing efficient (i.e., polynomial-time) algorithms for this problem only guarantee an $O (T^{2 / 3})$ upper-bound on the regret. Yet, inefficient algorithms based on EXP4 can achieve $O (\sqrt{T})$ . In this work, we provided a new computationally efficient algorithm inspired by EXP3 satisfying a regret of order $O (\sqrt{T})$ when the availabilities of each action $i \in 𝒜$ are independent. We then studied the most general version of the problem where at each round available sets are generated from some unknown arbitrary distribution (i.e., without the independence assumption) and proposed an efficient algorithm with $O (2^{K} \sqrt{T})$ regret guarantee. Our theoretical results were corroborated with experimental evaluations.

Bilateral contracts and grants with industry Bilateral contracts with industry

Microsoft Research: “Structured Large-Scale Machine Learning”. Machine learning is now ubiquitous in industry, science, engineering, and personal life. While early successes were obtained by applying off-the-shelf techniques, there are two main challenges faced by machine learning in the “big data” era: structure and scale. The project proposes to explore three axes, from theoretical, algorithmic and practical perspectives: (1) large-scale convex optimization, (2) large-scale combinatorial optimization and (3) sequential decision making for structured data. The project involves two Inria sites (Paris and Grenoble) and four MSR sites (Cambridge, New England, Redmond, New York). Project website: http://www.msr-inria.fr/projects/structured-large-scale-machine-learning/.

Bilateral grants with industry

Alexandre d’Aspremont, Francis Bach, Martin Jaggi (EPFL): Google Focused award.

Francis Bach: Gift from Facebook AI Research.

Alexandre d’Aspremont: fondation AXA, "Mécénat scientifique", optimisation & machine learning.

Partnerships and cooperations International initiatives Inria International Labs 4TUNE Title:

Adaptive, Efficient, Provable and Flexible Tuning for Machine Learning

Duration:

2020 - 2022

Coordinator:

Francis Bach

Partners:

Machine Learning group, CWI (Netherlands)

Inria contact:

Francis Bach

Website:

http://pierre.gaillard.me/4tune/

Summary:

The long-term goal of 4TUNE is to push adaptive machine learning to the next level. We aim to develop refined methods, going beyond traditional worst-case analysis, for exploiting structure in the learning problem at hand. We will develop new theory and design sophisticated algorithms for the core tasks of statistical learning and individual sequence prediction. We are especially interested in understanding the connections between these tasks and developing unified methods for both. We will also investigate adaptivity to non-standard patterns encountered in embedded learning tasks, in particular in iterative equilibrium computations.

FOAM Title:

First-Order Accelerated Methods for machine learning

Duration:

2020 - 2022

Coordinator:

Alexandre d'Aspremont

Partners:

Mathematical and Computational Engineering, Pontificia Universidad Católica de Chile (Chile)

Inria contact:

Alexandre d'Aspremont

Website:

https://sites.google.com/view/cguzman/talks-and-events/foam-associate-team

Summary:

Our main interest is to investigate novel and improved convergence results for first-order iterative methods for saddle-points, variational inequalities and fixed points, under the lens of PEP. Our interest in improving first-order methods is also deeply related with applications in machine learning. Particularly in sparsity-oriented inverse problems, optimization methods are the workhorse for state of the art results. On some of these problems, a set of new hypothesis and theoretical results shows improved complexity bounds for problems with good recovery guarantees and we plan to extend these new performance bounds to the variational framework.

European initiatives FP7 & H2020 Projects

European Research Council: SEQUOIA project (grant number 724063), 2017-2022 (F. Bach), “Robust algorithms for learning from modern data”.

National initiatives

Alexandre d'Aspremont: IRIS, PSL “Science des données, données de la science”.

Dissemination Promoting scientific activities Scientific events: selection Member of the conference program committees

Pierre Gaillard, member of the program committee for the Conference on Learning Theory (COLT), 2020

Reviewer

Adrien Taylor, reviewer for International Conference on Machine Learning (ICML), 2020 (top reviewer award).

Adrien Taylor, reviewer for International Conference on Neural Information Processing Systems (Neurips), 2020 (top reviewer award).

Adrien Taylor, reviewer for Conference on Decision and Control (CDC), 2020.

Pierre Gaillard, reviewer for the International Conference on Artificial Intelligence and Statistics (Aistats), 2020

Journal Member of the editorial boards

Francis Bach, co-editor-in-chief, Journal of Machine Learning Research

Francis Bach, associate Editor, Mathematical Programming

Francis Bach, associate editor, Foundations of Computational Mathematics (FoCM)

Reviewer - reviewing activities

Adrien Taylor, reviewer for Automatica.

Adrien Taylor, reviewer for Journal of Machine Learning Research (JMLR).

Adrien Taylor, reviewer for Mathematical Programming (MAPR).

Adrein Taylor, reviewer for SIAM Journal on Optimization (SIOPT).

Adrien Taylor, reviewer for Computational Optimization and Applications (COAP).

Adrien Taylor, reviewer for Journal of Optimization Theory and Applications (JOTA).

Pierre Gaillard, reviewer for Mathematics of Operations Research (MOR).

Invited talks

Adrien Taylor, invited talk University of Cambridge (CCIMI seminars), February 2020, United Kingdom.

Adrien Taylor, invited talk at Université catholique de Louvain (Mathematical engineering seminars), February 2020, Belgium.

Adrien Taylor, invited talk at Pontificia Universidad Católica de Chile, April 2020, Online.

Adrien Taylor, invited talk at One World Optimization seminars, June 2020, Online.

Adrien Taylor, invited talk at CWI-INRIA workshop, September 2020, Online.

Pierre Gaillard, invited talk at the Valpred workshop, March 2020

Pierre Gaillard, invited talk at the Potsdamer research seminar, June 2020, online.

Pierre Gaillard, invited talk at the seminar of the Statify research team, Inria Grenoble, September 2020

Alessandro Rudi, invited talk at University College of London, Gatsby unit, London October 2020.

Francis Bach, invited virtual talk at Optimization for machine leaerning, CIRM, Luminy, March 2020.

Francis Bach, invited talk at MIT, September 2020

Francis Bach, invited virtual talk at the University of Texas, Austin, October 2020

Francis Bach, invited virtual talk at the Symposium on the Mathematical Foundations of Data Science, Johns Hopkins University, October 2020

Francis Bach, invited virtual talk at Harvard University, November 2020

Francis Bach, invited virtual talk at CIMAT, Centro de Investigación en Matemáticas, Mexico, November 2020

Teaching - Supervision - Juries Teaching

Master: Alexandre d'Aspremont, Optimisation Combinatoire et Convexe, avec Zhentao Li, (2015-Present) cours magistraux 30h, Master M1, ENS Paris.

Master: Alexandre d'Aspremont, Optimisation convexe: modélisation, algorithmes et applications cours magistraux 21h (2011-Present), Master M2 MVA, ENS PS.

Master : Francis Bach, Optimisation et apprentissage statistique, 20h, Master M2 (Mathématiques de l'aléatoire), Université Paris-Sud, France.

Master : Francis Bach, Machine Learning, 20h, Master ICFP (Physique), Université PSL.

Master: Pierre Gaillard, Alessandro Rudi, Introduction to Machine Learning, 52h, L3, ENS, Paris.

Master: Pierre Gaillard, Sequential learning, 20h, Master M2 MVA, ENS PS.

Hausdorff school on MCMC: Francis Bach, 6 hours.

Supervision

PhD in progress : Raphaël Berthier, started September 2017, supervised by Francis Bach and Pierre Gaillard.

PhD in progress : Radu - Dragomir Alexandru, Bregman Gradient Methods, 2018, Alexandre d'Aspremont (joint with Jérôme Bolte)

PhD in progress : Mathieu Barré, Accelerated Polyak Methods, 2018, Alexandre d'Aspremont

PhD in progress : Grégoire Mialon, Sample Selection Methods, 2018, Alexandre d'Aspremont (joint with Julien Mairal)

PhD in progress : Manon Romain, Causal Inference Algorithms, 2020, Alexandre d'Aspremont

PhD in progress: Alex Nowak-Vila, supervised by Francis Bach and Alessandro Rudi.

PhD in progress: Ulysse Marteau Ferey, supervised by Francis Bach and Alessandro Rudi.

PhD in progress: Vivien Cabannes, supervised by Francis Bach and Alessandro Rudi.

PhD in progress: Eloise Berthier, supervised by Francis Bach.

PhD in progress: Theo Ryffel, supervised by Francis Bach and David Pointcheval.

PhD in progress: Rémi Jezequel, supervised by Pierre Gaillard and Alessandro Rudi.

PhD in progress: Antoine Bambade, supervised by Jean-Ponce (Willow), Justin Carpentier (Willow), and Adrien Taylor.

PhD in progress: Marc Lambert, supervised by Francis Bach and Silvère Bonnabel.

PhD in progress: Ivan Lerner, co-advised with Anita Burgun et Antoine Neuraz.

PhD defended: Alexandre Défossez, supervised by Francis Bach and Léon Bottou (Facebook AI Research), defended in July 2020

PhD defended: Loucas Pillaud-Vivien, supervised by Francis Bach and Alessandro Rudi, defended October 30 2020

PhD defended: Margaux Brégère, supervised by Pierre Gaillard and Gilles Stoltz (Université Paris-Sud), defended in December 2020

PhD defended : Thomas Kerdreux, New Complexity Bounds for Frank Wolfe, 2017, Alexandre d'Aspremont

Juries

HdR Pierre Weiss, IMT Toulouse, September 2019 (Alexandre d'Aspremont).

HDR Rémi Flamary, Université de Nice, November 2019 (Francis Bach).

Non-parametric Models for Non-negative Functions Ulysse U. Marteau-Ferey Francis F. Bach Alessandro A. Rudi July 2020 Sharpness, Restart and Acceleration Vincent V. Roulet Alexandre A. D'Aspremont SIAM Journal on Optimization October 2020 30 1 262-289 Max-Plus Linear Approximations for Deterministic Continuous-State Markov Decision Processes Eloïse E. Berthier Francis F. Bach IEEE Control Systems Letters July 2020 4 3 767-772 Ranking and synchronization from pairwise measurements via SVD Alexandre A. D'Aspremont Mihai M. Cucuringu Hemant H. Tyagi Journal of Machine Learning Research February 2021 22 19 1-63 Worst-Case Convergence Analysis of Inexact Gradient and Newton Methods Through Semidefinite Programming Performance Estimation Etienne E. De Klerk François F. Glineur Adrien A. Taylor SIAM Journal on Optimization January 2020 30 3 2053-2082 Efficient First-order Methods for Convex Minimization: a Constructive Approach Yoel Y. Drori Adrien B. A. Taylor Mathematical Programming, Series A 2020 184 183-220 Sharpness, Restart and Acceleration Vincent V. Roulet Alexandre A. D'Aspremont SIAM Journal on Optimization October 2020 30 1 262-289 Operator Splitting Performance Estimation: Tight Contraction Factors and Optimal Parameter Selection Ernest E. Ryu Adrien A. Taylor Carolina C. Bergeling Pontus P. Giselsson SIAM Journal on Optimization January 2020 30 3 2251-2271 Who started this rumor? Quantifying the natural differential privacy guarantees of gossip protocols Aurélien A. Bellet Rachid R. Guerraoui Hadrien H. Hendrikx DISC 2020 - 34th International Symposium on Distributed Computing Freiburg / Virtual, Germany 2020 Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss Lenaic L. Chizat Francis F. Bach Proceedings of Thirty Third Conference on Learning Theory COLT 2020 - 33rd Annual Conference on Learning Theory Graz / Virtual, Austria July 2020 PMLR 125 1305-1338 Gamification of pure exploration for linear bandits Rémy R. Degenne Pierre P. Ménard Xuedong X. Shang Michal M. Valko ICML 2020 - 37th International Conference on Machine Learning Vienna / Virtual, Austria July 2020 Experimental Comparison of Semi-parametric, Parametric, and Machine Learning Models for Time-to-Event Analysis Through the Concordance Index Camila C. Fernandez Chung C. Shue Chen Pierre P. Gaillard Alonso A. Silva JDS 2020 - 52nd Statistics Days of the French Statistical Society (SFdS) Nice, France May 2020 Dual-Free Stochastic Decentralized Optimization with Variance Reduction Hadrien H. Hendrikx Francis F. Bach Laurent L. Massoulié Advances in Neural Information Processing Systems Proceedings NeurIPS 2020 - 34th Conference on Neural Information Processing Systems Vancouver / Virtual, Canada 2020 Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization Hadrien H. Hendrikx Lin L. Xiao Sébastien S. Bubeck Francis F. Bach Laurent L. Massoulié Proceedings of Machine Learning Research ICML 2020 - Thirty-seventh International Conference on Machine Learning Vienna / Virtual, Austria 2020 Convergence and Stability of Graph Convolutional Networks on Large Random Graphs Nicolas N. Keriven Alberto A. Bietti Samuel S. Vaiter NeurIPS 2020 - 34th Conference on Neural Information Processing Systems Vancouver (virtual), Canada December 2020 https://nips.cc/ A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention Grégoire G. Mialon Dexiong D. Chen Alexandre A. D'Aspremont Julien J. Mairal ICLR 2021 - The Ninth International Conference on Learning Representations Virtual, France May 2021 Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Functions Grégoire G. Mialon Alexandre A. D'Aspremont Julien J. Mairal AISTATS 2020 - 23rd International Conference on Artificial Intelligence and Statistics Palermo / Virtual, Italy June 2020 Statistical Estimation of the Poincaré constant and Application to Sampling Multimodal Distributions Loucas L. Pillaud-Vivien Francis F. Bach Tony T. Lelièvre Alessandro A. Rudi Gabriel G. Stoltz AISTATS 2020 : 23rd International Conference on Artificial Intelligence and Statistics Palermo / Virtual, Italy August 2020 Improved sleeping bandits with stochastic action sets and adversarial rewards Aadirupa A. Saha Pierre P. Gaillard Michal M. Valko ICML 2020 - 37th International Conference on Machine Learning Vienna / Virtual, Austria July 2020 Naive Feature Selection: Sparsity in Naive Bayes Armin A. Askari Alexandre A. D'Aspremont Laurent El L. Ghaoui AISTATS 2020 - 23rd International Conference on Artificial Intelligence and Statistics Palermo / Virtual, Italy June 2020 Complexity Guarantees for Polyak Steps with Momentum Mathieu M. Barré Adrien A. Taylor Alexandre A. D'Aspremont COLT 2020 - 33rd Annual Conference on Learning Theory Graz / Virtual, Austria July 2020 Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model Raphaël R. Berthier Francis F. Bach Pierre P. Gaillard NeurIPS '20 - 34th International Conference on Neural Information Processing Systems Vancouver, Canada December 2020 Structured Prediction with Partial Labelling through the Infimum Loss Vivien V. Cabannes Alessandro A. Rudi Francis F. Bach Proceedings of the 37th International Conference on Machine Learning ICML 2020 - 37th International Conference on Machine Learning Online, United States July 2020 119 1230-1239 Self-Supervised VQ-VAE for One-Shot Music Style Transfer Ondřej O. Cífka Alexey A. Ozerov Umut U. Simsekli Gael G. Richard ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing Toronto / Virtual, Canada June 2021 Efficient improper learning for online logistic regression Rémi R. Jézéquel Pierre P. Gaillard Alessandro A. Rudi COLT 2020 - 33rd Annual Conference on Learning Theory Graz / Virtual, Austria July 2020 Regularity as Regularization: Smooth and Strongly Convex Brenier Potentials in Optimal Transport François-Pierre F.-P. Paty Alexandre A. D'Aspremont Marco M. Cuturi AISTATS 2020 - 23rd International Conference on Artificial Intelligence and Statistics Palermo / Virtual, Italy June 2020 Stochastic bandit algorithms for demand side management Margaux M. Brégère December 2020 Accelerating conditional gradient methods Thomas T. Kerdreux June 2020 FANOK: Knockoffs in Linear Time Armin A. Askari Quentin Q. Rebjock Alexandre A. D'Aspremont Laurent El L. Ghaoui October 2020 On the Effectiveness of Richardson Extrapolation in Machine Learning Francis F. Bach July 2020 Principled Analyses and Design of First-Order Methods with Inexact Proximal Operators Mathieu M. Barré Adrien A. Taylor Francis F. Bach September 2020 Convergence of Constrained Anderson Acceleration Mathieu M. Barré Adrien A. Taylor Alexandre A. D'Aspremont December 2020 A Continuized View on Nesterov Acceleration Raphaël R. Berthier Francis F. Bach Nicolas N. Flammarion Pierre P. Gaillard Adrien A. Taylor February 2021 Fast and Robust Stability Region Estimation for Nonlinear Dynamical Systems Eloïse E. Berthier Justin J. Carpentier Francis F. Bach October 2020 Deep Equals Shallow for ReLU Networks in Kernel Regimes Alberto A. Bietti Francis F. Bach October 2020 Global Convergence of Frank Wolfe on One Hidden Layer Networks Alexandre A. D'Aspremont Mert M. Pilanci October 2020 Experimental Comparison of Semi-parametric, Parametric, and Machine Learning Models for Time-to-Event Analysis Through the Concordance Index Camila C. Fernandez Chung Shue C. Chen Pierre P. Gaillard Alonso A. Silva March 2020 An Approximate Shapley-Folkman Theorem Thomas T. Kerdreux Igor I. Colin Alexandre A. D'Aspremont October 2020 The recursive variational Gaussian approximation (R-VGA) Marc M. Lambert Silvere S. Bonnabel Francis F. Bach December 2020 Non-parametric Models for Non-negative Functions Ulysse U. Marteau-Ferey Francis F. Bach Alessandro A. Rudi July 2020 Finite-sample analysis of M-estimators using self-concordance Dmitrii M. D. Ostrovskii Francis F. Bach November 2020 Non-stationary Online Regression Anant A. Raj Pierre P. Gaillard Christophe C. Saad November 2020 Finding Global Minima via Kernel Approximations Alessandro A. Rudi Ulysse U. Marteau-Ferey Francis F. Bach December 2020 ARIANN: Low-Interaction Privacy-Preserving Deep Learning via Function Secret Sharing Théo T. Ryffel David D. Pointcheval Francis F. Bach July 2020 Counterfactual Learning of Continuous Stochastic Policies Houssam H. Zenati Alberto A. Bietti Matthieu M. Martin Eustache E. Diemert Julien J. Mairal June 2020