SIERRA - 2015 - Annual activity report

SIERRA

SIERRA - 2015

Project-Team Sierra

Members

Overall Objectives

Statement

Research Program

Application Domains

Application Domains

Highlights of the Year

New Software and Platforms

DICA: Moment Matching for Latent Dirichlet Allocation (LDA) and Discrete Independent Component Analysis (DICA)
LinearFW: Implementation of linearly convergent versions of Frank-Wolfe
CNN-Head-Detection: Context-aware CNNs for person head detection

New Results

Bilateral Contracts and Grants with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Publications of the year

Previous |

Home | Next next

Section: New Results

Variance Reduced Stochastic Gradient Descent with Neighbors

Participant : Simon Lacoste-Julien [correspondent] .

Collaboration with Thomas Hofmann [correspondent], Aurelien Lucchi and Brian McWilliams (ETH Zurich).

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in memory. Therefore speed-ups relative to SGD may need a minimal number of epochs in order to materialize. This paper [15] investigates algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points, which offers advantages in the transient optimization phase. As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms. We provide experimental results supporting our theory.

Previous |

Home | Next next