Section: New Results
Largescale statistical learning
Stochastic Optimization with Variance Reduction for Infinite Datasets with FiniteSum Structure
Participants : Alberto Bietti, Julien Mairal.
Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In [14], we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example.
Catalyst Acceleration for Firstorder Convex Optimization: from Theory to Practice
Participants : Hongzhou Lin, Julien Mairal, Zaid Harchaoui.
In this paper [35], we introduce a generic scheme for accelerating gradientbased optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact accelerated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of wellchosen auxiliary problems, leading to faster convergence. One of the key to achieve acceleration in theory and in practice is to solve these subproblems with appropriate accuracy by using the right stopping criterion and the right warmstart strategy. In this paper, we give practical guidelines to use Catalyst and present a comprehensive theoretical analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, Finito/MISO, and their proximal variants. For all of these methods, we provide acceleration and explicit support for nonstrongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for illconditioned problems.
A Generic QuasiNewton Algorithm for Faster GradientBased Optimization
Participants : Hongzhou Lin, Julien Mairal, Zaid Harchaoui.
In this paper [34], we propose a generic approach to accelerate gradientbased optimization algorithms with quasiNewton principles. The proposed scheme, called QuickeNing, can be applied to incremental firstorder methods such as stochastic variancereduced gradient (SVRG) or incremental surrogate optimization (MISO). It is also compatible with composite objectives, meaning that it has the ability to provide exactly sparse solutions when the objective involves a sparsityinducing regularization. QuickeNing relies on limitedmemory BFGS rules, making it appropriate for solving highdimensional optimization problems. Besides, it enjoys a worstcase linear convergence rate for strongly convex problems. We present experimental results where QuickeNing gives significant improvements over competing methods for solving largescale highdimensional machine learning problems, see Figure 18 for example.

Catalyst Acceleration for GradientBased NonConvex Optimization
Participants : Courtney Paquette, Hongzhou Lin, Dmitriy Drusvyatskiy, Julien Mairal, Zaid Harchaoui.
In this paper [36], we introduce a generic scheme to solve nonconvex optimization problems using gradientbased algorithms originally designed for minimizing convex functions. When the objective is convex, the proposed approach enjoys the same properties as the Catalyst approach of Lin et al, 2015. When the objective is nonconvex, it achieves the best known convergence rate to stationary points for firstorder methods. Specifically, the proposed algorithm does not require knowledge about the convexity of the objective; yet, it obtains an overall worstcase efficiency of $O\left({\u03f5}^{2}\right)$ and, if the function is convex, the complexity reduces to the nearoptimal rate $O\left({\u03f5}^{2/3}\right)$. We conclude the paper by showing promising experimental results obtained by applying the proposed approach to SVRG and SAGA for sparse matrix factorization and for learning neural networks (see Figure 19).