SIERRA

SIERRA - 2024

2024Activity reportTeamSIERRA

Inria teams are typically groups of researchers working on the definition of a common project, and objectives, with the goal to arrive at the creation of a project-team. Such project-teams may include other partners (universities or research institutions).

RNSR: 201120973D

Research center Inria Paris Centre
In partnership with:Ecole normale supérieure de Paris, CNRS
Team name: Machine Learning and Optimisation
In collaboration with:Département d'Informatique de l'Ecole Normale Supérieure
Domain:Applied Mathematics, Computation and Simulation
Theme:Optimization, machine learning and statistical methods

Keywords

Computer Science and Digital Science

A3.4. Machine learning and statistics
A5.4. Computer vision
A6.2. Scientific computing, Numerical Analysis & Optimization
A7.1. Algorithms
A8.2. Optimization
A9.2. Machine learning

1 Team members, visitors, external collaborators

Research Scientists

Francis Bach [Team leader, Inria, Senior Researcher, HDR]
Michael Jordan [Fondation Inria]
Alessandro Rudi [INRIA, Researcher, until Sep 2024]
Umut Simsekli [INRIA, Researcher]
Adrien Taylor [INRIA, Researcher]
Alexandre d'Aspremont [CNRS, Senior Researcher]

Post-Doctoral Fellows

Luc Brogat-Motte [Centrale Supelec, Post-Doctoral Fellow]
Yurong Chen [INRIA, Post-Doctoral Fellow, from Oct 2024]
Fajwel Fogel [ENS PARIS, Post-Doctoral Fellow]
Maxime Haddouche [INRIA, Post-Doctoral Fellow, from Sep 2024]
David Holzmuller [INRIA, Post-Doctoral Fellow]
Frederik Kunstner [INRIA, Post-Doctoral Fellow, from Oct 2024]
Anant Raj [INRIA, Post-Doctoral Fellow, until Sep 2024]
Fabian Schaipp [INRIA, Post-Doctoral Fellow, from Sep 2024]
Corbinian Schlosser [INRIA, Post-Doctoral Fellow]
Yang Su [ENS PARIS, Post-Doctoral Fellow]
Paul Viallard [INRIA, Post-Doctoral Fellow, until Jan 2024]
Julien Weibel [INRIA, Post-Doctoral Fellow, from Dec 2024]

PhD Students

Andrea Basteri [INRIA]
Eugene Berta [INRIA, from May 2024]
Gaspard Beugnot [INRIA, until Feb 2024]
Pierre Boudart [INRIA]
Nabil Boukir [INRIA, from Nov 2024]
Sacha Braun [INRIA, from Sep 2024]
Sarah Brood [ENS Paris]
Arthur Calvi [CNRS]
Theophile Cantelobre [INRIA, until Sep 2024]
Aymeric Capitaine [CMAP]
Benjamin Dupuis [INRIA]
Bertille Follain [ENS PARIS, until Nov 2024]
Alexandre Francois [INRIA, from Sep 2024]
Etienne Gauthier [INRIA, from Sep 2024]
Mahmoud Hegazy [CMAP, from Oct 2024]
Marc Lambert [DGA]
Clément Lezane [UNIV TWENTE]
Simon Martin [ENS Paris]
Céline Moucer [ENS PARIS-SACLAY]
Benjamin Paul-Dubois-Taine [UNIV PARIS SACLAY]
Antoine Scheid [CMAP]
Dario Shariatian [INRIA]
Lawrence Stewart [INRIA]

Interns and Apprentices

Melih Barsbey [UNIV BOGAZICI, until Aug 2024]
Daniel Einar Berg Thomsen [INRIA, from Nov 2024]
Eliot Beyler [ENS PARIS, Intern, from Sep 2024]
Eliot Beyler [ENS PARIS, Intern, from Apr 2024 until Jul 2024]
Nabil Boukir [INRIA, Intern, from May 2024 until Sep 2024]
Sacha Braun [INRIA, Intern, from Apr 2024 until Aug 2024]
Clementine Chazal [INRIA, Intern, from Feb 2024 until Aug 2024]
Léo Dana [ENS DE LYON, Intern, from Sep 2024]
Alexandre Francois [INRIA, Intern, from Apr 2024 until Aug 2024]
Etienne Gauthier [INRIA, Intern, from Apr 2024 until Aug 2024]
Aaron Mishkin [INRIA, from Jul 2024]

Administrative Assistants

Marina Kovacic [INRIA]
Abigail Palma [INRIA, from Sep 2024]

Visiting Scientists

Rayna Andreeva [UNIV EDIMBOURG, from Feb 2024 until Jun 2024]
Ioan-Liviu Aolaritei [UNIV BERKELEY, from Oct 2024]
Ioan-Liviu Aolaritei [ENS Paris, until Mar 2024]
Eugene Berta [Inria, until Apr 2024]
Laurent El Ghaoui [Inria International Chair, from Jun 2024 until Jun 2024]
Sebastian Gregor Gruber [German Cancer Research Center (DKFZ), from Mar 2024 until May 2024]
Steffen Grunewalder [UNIV NEWCASTLE, until Feb 2024]
Antônio Horta Ribeiro [UNIV UPPSALA, from Nov 2024]
Anne Rubbens [FNRS, from Apr 2024 until May 2024]
Manu Upadhyaya [Lund University, from Sep 2024]

2 Overall objectives

2.1 Statement

Machine learning is a recent scientific domain, positioned between applied mathematics, statistics and computer science. Its goals are the optimization, control, and modeling of complex systems from examples. It applies to data from numerous engineering and scientific fields (e.g., vision, bioinformatics, neuroscience, audio processing, text processing, economy, finance, etc.), the ultimate goal being to derive general theories and algorithms allowing advances in each of these domains. Machine learning is characterized by the high quality and quantity of the exchanges between theory, algorithms and applications: interesting theoretical problems almost always emerge from applications, while theoretical analysis allows the understanding of why and when popular or successful algorithms do or do not work, and leads to proposing significant improvements.

Our academic positioning is exactly at the intersection between these three aspects—algorithms, theory and applications—and our main research goal is to make the link between theory and algorithms, and between algorithms and high-impact applications in various engineering and scientific fields.

3 Research program

Machine learning has emerged as its own scientific domain in the last 30 years, providing a good abstraction of many problems and allowing exchanges of best practices between data oriented scientific fields. Among its main research areas, there are currently probabilistic models, supervised learning (including neural networks), unsupervised learning, reinforcement learning, and statistical learning theory. All of these are represented in the SIERRA team, but the main goals of the team are mostly related to supervised learning and optimization, and their mutual interactions, as well as with interdisciplinary collaborations. One particularity of the team is the strong focus on optimization (in particular convex optimization, but with more works in the non-convex world recently), leading to contributions in optimization which go beyond the machine learning context. Moreover, we interact more and more with other disciplines of applied mathematics (e.g., numerical analysis, control), and economics.

We have divided our research effort in four axes.

Optimization
Statistical machine learning
Machine learning in interaction
Incentives and machine learning

4 Application domains

Machine learning research can be conducted from two main perspectives: the first one, which has been dominant in the last 30 years, is to design learning algorithms and theories which are as generic as possible, the goal being to make as few assumptions as possible regarding the problems to be solved and to let data speak for themselves. This has led to many interesting methodological developments and successful applications. However, we believe that this strategy has reached its limit for many application domains, such as computer vision, bioinformatics, neuro-imaging, text and audio processing, which leads to the second perspective our team is built on: Research in machine learning theory and algorithms should be driven by interdisciplinary collaborations, so that specific prior knowledge may be properly introduced into the learning process, in particular with the following fields:

Computer vision: object recognition, object detection, image segmentation, image/video processing, computational photography. In collaboration with the Willow project-team.
Bioinformatics: cancer diagnosis, protein function prediction, virtual screening.
Text processing: document collection modeling, language models.
Audio processing: source separation, speech/music processing.
Climate science (satellite imaging).

5 Social and environmental responsibility

As one domain within applied mathematics and computer science, machine learning and artificial intelligence may contribute positively to the environment for example by measuring climate change effect or reducing the carbon footprint of other sciences and activities. But it may also contribute negatively, notably by the ever-increasing sizes of machine learning models. Within the team, we work on these two aspects through our work on climate science and on frugal algorithms.

6 Highlights of the year

The team was re-created!

6.1 Awards

Adrien Taylor: ERC Starting Grant (project CASPER)
Francis Bach, Anant Raj: Roberto Tempo best Conference on Decision and Control paper award

7 New software, platforms, open data

7.1 New software

7.1.1 PEPit

Name:
PEPit
Keyword:
Optimisation
Functional Description:

PEPit is a Python package aiming at simplifying the access to worst-case analyses of a large family of first-order optimization methods possibly involving gradient, projection, proximal, or linear optimization oracles, along with their approximate, or Bregman variants. In short, PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods. The key underlying idea is to cast the problem of performing a worst-case analysis, often referred to as a performance estimation problem (PEP), as a semidefinite program (SDP) which can be solved numerically. For doing that, the package users are only required to write first-order methods nearly as they would have implemented them. The package then takes care of the SDP modelling parts, and the worst-case analysis is performed numerically via a standard solver.

This software is primarily based on the works on performance estimation problems by Adrien Taylor. Compared to other scientific software, its maintenance is relatively low cost (we can do it ourself, together with students involved in using those techniques). We plan to continue updating this software by incorporating recent advances of the community, and with the clear long term idea of making it a tool for teaching first-order optimization.
URL:
https://pepit.readthedocs.io/en/0.2.0/
Contact:
Adrien Taylor

8 New results

8.1 Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses

In this work,we propose a computer-assisted approach to the analysis of the worst-case convergence of nonlinear conjugate gradient methods (NCGMs). Those methods are known for their generally good empirical performances for large-scale optimization, while having relatively incomplete analyses. Using our computer-assisted approach, we establish novel complexity bounds for the Polak-Ribière-Polyak (PRP) and the Fletcher-Reeves (FR) NCGMs for smooth strongly convex minimization. In particular, we construct mathematical proofs that establish the first non-asymptotic convergence bound for FR (which is historically the first developed NCGM), and a much improved non-asymptotic convergence bound for PRP. Additionally, we provide simple adversarial examples on which these methods do not perform better than gradient descent with exact line search, leaving very little room for improvements on the same class of problems.

8.2 Automated tight Lyapunov analysis for first-order methods

In this work, we present a methodology for establishing the existence of quadratic Lyapunov inequalities for a wide range of first-order methods used to solve convex optimization problems. In particular, we consider (i) classes of optimization problems of finite-sum form with (possibly strongly) convex and possibly smooth functional components, (ii) first-order methods that can be written as a linear system on state-space form in feedback interconnection with the subdifferentials of the functional components of the objective function, and (iii) quadratic Lyapunov inequalities that can be used to draw convergence conclusions. We present a necessary and sufficient condition for the existence of a quadratic Lyapunov inequality within a predefined class of Lyapunov inequalities, which amounts to solving a small-sized semidefinite program. We showcase our methodology on several first-order methods that fit the framework. Most notably, our methodology allows us to significantly extend the region of parameter choices that allow for duality gap convergence in the Chambolle–Pock method when the linear operator is the identity mapping.

8.3 Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years. While illuminating interesting aspects of stochastic optimizers by using heavy-tailed stochastic differential equations as proxies, prior works either provided expected generalization bounds, or introduced non-computable information theoretic terms. Addressing these drawbacks, in this work, we prove high-probability generalization bounds for heavy-tailed SDEs which do not contain any nontrivial information theoretic terms. To achieve this goal, we develop new proof techniques based on estimating the entropy flows associated with the so-called fractional Fokker-Planck equation (a partial differential equation that governs the evolution of the distribution of the corresponding heavy-tailed SDE). In addition to obtaining high-probability bounds, we show that our bounds have a better dependence on the dimension of parameters as compared to prior art. Our results further identify a phase transition phenomenon, which suggests that heavy tails can be either beneficial or harmful depending on the problem structure. We support our theory with experiments conducted in a variety of settings. Further information is in 7.

8.4 Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD

Neural network compression has been an increasingly important subject, not only due to its practical relevance, but also due to its theoretical implications, as there is an explicit connection between compressibility and generalization error. Recent studies have shown that the choice of the hyperparameters of stochastic gradient descent (SGD) can have an effect on the compressibility of the learned parameter vector. These results, however, rely on unverifiable assumptions and the resulting theory does not provide a practical guideline due to its implicitness. In this study, we propose a simple modification for SGD, such that the outputs of the algorithm will be provably compressible without making any nontrivial assumptions. We consider a one-hidden-layer neural network trained with SGD, and show that if we inject additive heavy-tailed noise to the iterates at each iteration, for any compression rate, there exists a level of overparametrization such that the output of the algorithm will be compressible with high probability. To achieve this result, we make two main technical contributions: (i) we prove a 'propagation of chaos' result for a class of heavy-tailed stochastic differential equations, and (ii) we derive error estimates for their Euler discretization. Our experiments suggest that the proposed approach not only achieves increased compressibility with various models and datasets, but also leads to robust test performance under pruning, even in more realistic architectures that lie beyond our theoretical setting.Further information is in 24.

8.5 Piecewise deterministic generative models

We introduce a novel class of generative models based on piecewise deterministic Markov processes (PDMPs), a family of non-diffusive stochastic processes consisting of deterministic motion and random jumps at random times. Similarly to diffusions, such Markov processes admit time reversals that turn out to be PDMPs as well. We apply this observation to three PDMPs considered in the literature: the Zig-Zag process, Bouncy Particle Sampler, and Randomised Hamiltonian Monte Carlo. For these three particular instances, we show that the jump rates and kernels of the corresponding time reversals admit explicit expressions depending on some conditional densities of the PDMP under consideration before and after a jump. Based on these results, we propose efficient training procedures to learn these characteristics and consider methods to approximately simulate the reverse process. Finally, we provide bounds in the total variation distance between the data distribution and the resulting distribution of our model in the case where the base distribution is the standard $d$ -dimensional Gaussian distribution. Promising numerical simulations support further investigations into this class of models. Further information is in 3.

8.6 Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms

We present a novel set of rigorous and computationally efficient topology-based complexity notions that exhibit a strong correlation with the generalization gap in modern deep neural networks (DNNs). DNNs show remarkable generalization properties, yet the source of these capabilities remains elusive, defying the established statistical learning theory. Recent studies have revealed that properties of training trajectories can be indicative of generalization. Building on this insight, state-of-the-art methods have leveraged the topology of these trajectories, particularly their fractal dimension, to quantify generalization. Most existing works compute this quantity by assuming continuous- or infinite-time training dynamics, complicating the development of practical estimators capable of accurately predicting generalization without access to test data. In this paper, we respect the discrete-time nature of training trajectories and investigate the underlying topological quantities that can be amenable to topological data analysis tools. This leads to a new family of reliable topological complexity measures that provably bound the generalization error, eliminating the need for restrictive geometric assumptions. These measures are computationally friendly, enabling us to propose simple yet effective algorithms for computing generalization indices. Moreover, our flexible framework can be extended to different domains, tasks, and architectures. Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks. Our approach consistently outperforms existing topological bounds across a wide range of datasets, models, and optimizers, highlighting the practical relevance and effectiveness of our complexity measures. Further information is in 1.

8.7 Heavy-Tail Phenomenon in Decentralized SGD

Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in modern machine learning applications. In this paper, we study the emergence of heavy-tails in decentralized stochastic gradient descent (DE-SGD), and investigate the effect of decentralization on the tail behavior. We first show that, when the loss function at each computational node is twice continuously differentiable and strongly convex outside a compact region, the law of the DE-SGD iterates converges to a distribution with polynomially decaying (heavy) tails. To have a more explicit control on the tail exponent, we then consider the case where the loss at each node is a quadratic, and show that the tail-index can be estimated as a function of the step-size, batch-size, and the topological properties of the network of the computational nodes. Then, we provide theoretical and empirical results showing that DE-SGD has heavier tails than centralized SGD. We also compare DE-SGD to disconnected SGD where nodes distribute the data but do not communicate. Our theory uncovers an interesting interplay between the tails and the network structure: we identify two regimes of parameters (stepsize and network size), where DE-SGD can have lighter or heavier tails than disconnected SGD depending on the regime. Finally, to support our theoretical results, we provide numerical experiments conducted on both synthetic data and neural networks. Furter information is in 8.

8.8 Iteratively Reweighted Least Squares for Phase Unwrapping

The 2D phase unwrapping problem seeks to recover a phase image from its observation modulo 2π, and is a crucial step in a variety of imaging applications. In particular, it is one of the most time-consuming steps in the interferometric synthetic aperture radar (InSAR) pipeline. In this work we tackle the L1-norm phase unwrapping problem. In optimization terms, this is a simple sparsity-inducing problem, albeit in very large dimension. To solve this high-dimensional problem, we iteratively solve a series of numerically simpler weighted least squares problems, which are themselves solved using a preconditioned conjugate gradient method. Our algorithm guarantees a sublinear rate of convergence in function values, is simple to implement and can easily be ported to GPUs, where it significantly outperforms state of the art phase unwrapping methods.

8.9 Frank-Wolfe meets Shapley-Folkman: a systematic approach for solving nonconvex separable problems with linear constraints

We consider separable nonconvex optimization problems under affine constraints. For these problems, the Shapley-Folkman theorem provides an upper bound on the duality gap as a function of the nonconvexity of the objective functions, but does not provide a systematic way to construct primal solutions satisfying that bound. In this work, we develop a two-stage approach to do so. The first stage approximates the optimal dual value with a large set of primal feasible solutions. In the second stage, this set is trimmed down to a primal solution by computing (approximate) Caratheodory representations. The main computational requirement of our method is tractability of the Fenchel conjugates of the component functions and their (sub)gradients. When the function domains are convex, the method recovers the classical duality gap bounds obtained via Shapley-Folkman. When the function domains are nonconvex, the method also recovers classical duality gap bounds from the literature, based on a more general notion of nonconvexity.

8.10 Open-Canopy: Towards Very High Resolution Forest Monitoring

Estimating canopy height and its changes at meter resolution from satellite imagery is a significant challenge in computer vision with critical environmental applications. However, the lack of open-access datasets at this resolution hinders the reproducibility and evaluation of models. We introduce Open-Canopy, the first open-access, country-scale benchmark for very high-resolution (1.5 m) canopy height estimation, covering over 87,000 km2 across France with 1.5 m resolution satellite imagery and aerial LiDAR data. Additionally, we present Open-Canopy-Δ, a benchmark for canopy height change detection between images from different years at tree level-a challenging task for current computer vision models. We evaluate state-of-the-art architectures on these benchmarks, highlighting significant challenges and opportunities for improvement. Our datasets and code are publicly available at this https URL.

8.11 An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems

Clément Lezane, Cristóbal Guzmán, Alexandre d'Aspremont In this work, we revisit the problem of solving large-scale semidefinite programs using randomized first-order methods and stochastic smoothing. We introduce two oblivious stochastic mirror descent algorithms based on a complementary composite setting. One algorithm is designed for non-smooth objectives, while an accelerated version is tailored for smooth objectives. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. For the non-smooth case with bounded oracles, we prove a convergence rate. For the L-smooth case with a feasible set bounded by D, we derive a convergence rate These rates had only been obtained so far by either assuming prior knowledge of the Lipschitz constant or the starting distance to an optimal solution. We further show how to extend our framework to relative scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.

8.12 Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

A recent line of empirical studies has demonstrated that SGD might exhibit a heavy-tailed behavior in practical settings, and the heaviness of the tails might correlate with the overall performance. In this work, we investigate the emergence of such heavy tails. Previous works on this problem only considered, up to our knowledge, online (also called single-pass) SGD, in which the emergence of heavy tails in theoretical findings is contingent upon access to an infinite amount of data. Hence, the underlying mechanism generating the reported heavy-tailed behavior in practical settings, where the amount of training data is finite, is still not well-understood. Our contribution aims to fill this gap. In particular, we show that the stationary distribution of offline (also called multi-pass) SGD exhibits ‘approximate’ power-law tails and the approximation error is controlled by how fast the empirical distribution of the training data converges to the true underlying data distribution in the Wasserstein metric. Our main takeaway is that, as the number of data points increases, offline SGD will behave increasingly ‘power-law-like’. To achieve this result, we first prove nonasymptotic Wasserstein convergence bounds for offline SGD to online SGD as the number of data points increases, which can be interesting on their own. Finally, we illustrate our theory on various experiments conducted on synthetic data and neural networks. Further details are in 16.

8.13 Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically required a different proof technique with significantly different mathematical tools. In this study, we make a novel connection between learning theory and applied probability and introduce a unified guideline for proving Wasserstein stability bounds for stochastic optimization algorithms. We illustrate our approach on stochastic gradient descent (SGD) and we obtain time-uniform stability bounds (i.e., the bound does not increase with the number of iterations) for strongly convex losses and nonconvex losses with additive noise, where we recover similar results to the prior art or extend them to more general cases by using a single proof technique. Our approach is flexible and can be generalizable to other popular optimizers, as it mainly requires developing Lyapunov functions, which are often readily available in the literature. It also illustrates that ergodicity is an important component for obtaining time-uniform bounds – which might not be achieved for convex or non-convex losses unless additional noise is injected to the iterates. Finally, we slightly stretch our analysis technique and prove time-uniform bounds for SGD under convex and non-convex losses (without additional additive noise), which, to our knowledge, is novel. Further information is in 26.

8.14 Learning via Wasserstein-Based High Probability Generalisation Bounds

Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation (SRM) – this is in particular at the core of PAC-Bayesian learning. Despite its successes and unfailing surge of interest in recent years, a limitation of the PAC-Bayesian framework is that most bounds involve a Kullback-Leibler (KL) divergence term (or its variations), which might exhibit erratic behavior and fail to capture the underlying geometric structure of the learning problem – hence restricting its use in practical applications. As a remedy, recent studies have attempted to replace the KL divergence in the PAC-Bayesian bounds with the Wasserstein distance. Even though these bounds alleviated the aforementioned issues to a certain extent, they either hold in expectation, are for bounded losses, or are nontrivial to minimize in an SRM framework. In this work, we contribute to this line of research and prove novel Wasserstein distance-based PAC-Bayesian generalisation bounds for both batch learning with independent and identically distributed (i.i.d.) data, and online learning with potentially non-i.i.d. data. Contrary to previous art, our bounds are stronger in the sense that (i) they hold with high probability, (ii) they apply to unbounded (potentially heavy-tailed) losses, and (iii) they lead to optimizable training objectives that can be used in SRM. As a result we derive novel Wasserstein-based PAC-Bayesian learning algorithms and we illustrate their empirical advantage on a variety of experiments. More information can be found in 23.

8.15 Generalization Guarantees via Algorithm-dependent Rademacher Complexity

Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information. More information can be found in 21.

8.16 Generalization Bounds using Data-Dependent Fractal Dimensions

Providing generalization guarantees for modern neural networks has been a crucial task in statistical learning. Recently, several studies have attempted to analyze the generalization error in such settings by using tools from fractal geometry. While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption. To achieve this goal, we build up on a classical covering argument in learning theory and introduce a data-dependent fractal dimension. Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error (over either fixed or random hypothesis spaces) along with certain mutual information (MI) terms. To provide a clearer interpretation to the newly introduced MI terms, as a next step, we introduce a notion of ‘geometric stability’ and link our bounds to the prior art. Finally, we make a rigorous connection between the proposed data-dependent dimension and topological data analysis tools, which then enables us to compute the dimension in a numerically efficient way. We support our theory with experiments conducted on various settings. More information can be found in 6.

8.17 Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

Heavy-tail phenomena in stochastic gradient descent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior of SGD. To address this empirical phenomena theoretically, several works have made strong topological and statistical assumptions to link the generalization error to heavy tails. Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations. While these bounds do not require additional topological assumptions given that SGD can be modeled using a heavy-tailed stochastic differential equation (SDE), they can only apply to simple quadratic problems. In this work, we build on this line of research and develop generalization bounds for a more general class of objective functions, which includes non-convex functions as well. Our approach is based on developing Wasserstein stability bounds for heavy-tailed SDEs and their discretizations, which we then convert to generalization bounds. Our results do not require any nontrivial assumptions; yet, they shed more light to the empirical observations, thanks to the generality of the loss functions. More information can be found in 19.

8.18 Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

8.19 Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random stepsize, cyclic stepsize as well as the constant stepsize as special cases, and motivated by the literature which shows that heaviness of the tails (measured by the so-called “tail-index”) in the SGD iterates is correlated with generalization, we study tail-index and provide a number of theoretical results that demonstrate how the tail-index varies on the stepsize scheduling. Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior. We illustrate our theory on linear regression experiments and show through deep learning experiments that Markovian stepsizes can achieve even a heavier tail and be a viable alternative to cyclic and i.i.d. randomized stepsize rules. More information can be found in 9.

8.20 An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems

In this work, we revisit the problem of solving large-scale semidefinite programs using randomized first-order methods and stochastic smoothing. We introduce two oblivious stochastic mirror descent algorithms based on a complementary composite setting. One algorithm is designed for non-smooth objectives, while an accelerated version is tailored for smooth objectives. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. For the non-smooth case with $ℳ -$ bounded oracles, we prove a convergence rate of $O (ℳ / \sqrt{T})$ . For the $L$ -smooth case with a feasible set bounded by $D$ , we derive a convergence rate of $O (L^{2} D^{2} / (T^{2} \sqrt{T}) + (D_{0}^{2} + σ^{2}) / \sqrt{T})$ , where $D_{0}$ is the starting distance to an optimal solution, and $σ^{2}$ is the stochastic oracle variance. These rates had only been obtained so far by either assuming prior knowledge of the Lipschitz constant or the starting distance to an optimal solution. We further show how to extend our framework to relative scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.

8.21 Vision Transformers, a new approach for high-resolution and large-scale mapping of canopy heights

Accurate and timely monitoring of forest canopy heights is critical for assessing forest dynamics, biodiversity, carbon sequestration as well as forest degradation and deforestation. Recent advances in deep learning techniques, coupled with the vast amount of spaceborne remote sensing data offer an unprecedented opportunity to map canopy height at high spatial and temporal resolutions. Current techniques for wall-to-wall canopy height mapping correlate remotely sensed 2D information from optical and radar sensors to the vertical structure of trees using LiDAR measurements. While studies using deep learning algorithms have shown promising performances for the accurate mapping of canopy heights, they have limitations due to the type of architectures and loss functions employed. Moreover, mapping canopy heights over tropical forests remains poorly studied, and the accurate height estimation of tall canopies is a challenge due to signal saturation from optical and radar sensors, persistent cloud covers and sometimes the limited penetration capabilities of LiDARs. Here, we map heights at 10 m resolution across the diverse landscape of Ghana with a new vision transformer (ViT) model optimized concurrently with a classification (discrete) and a regression (continuous) loss function. This model achieves better accuracy than previously used convolutional based approaches (ConvNets) optimized with only a continuous loss function. The ViT model results show that our proposed discrete/continuous loss significantly increases the sensitivity for very tall trees (i.e., > 35m), for which other approaches show saturation effects. The height maps generated by the ViT also have better ground sampling distance and better sensitivity to sparse vegetation in comparison to a convolutional model. Our ViT model has a RMSE of 3.12m in comparison to a reference dataset while the ConvNet model has a RMSE of 4.3m.

8.22 Non-Parametric Learning of Stochastic Differential Equations with Fast Rates of Convergence.

In 27, we propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of multi-dimensional non-linear stochastic differential equations, which relies upon discrete-time observations of the state. The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations, yielding theoretical estimates of non-asymptotic learning rates which, unlike previous works, become increasingly tighter when the regularity of the unknown drift and diffusion coefficients becomes higher. Our method being kernel-based, offline pre-processing may be profitably leveraged to enable efficient numerical implementation, offering excellent balance between precision and computational complexity.

8.23 Structured Prediction in Online Learning.

In 69 we study a theoretical and algorithmic framework for structured prediction in the online learning setting. The problem of structured prediction, i.e. estimating function where the output space lacks a vectorial structure, is well studied in the literature of supervised statistical learning. We show that our algorithm is a generalisation of optimal algorithms from the supervised learning setting, and achieves the same excess risk upper bound also when data are not i.i.d. Moreover, we consider a second algorithm designed especially for non-stationary data distributions, including adversarial data. We bound its stochastic regret in function of the variation of the data distributions.

8.24 Closed-form Filtering for Non-linear Systems

Sequential Bayesian Filtering aims to estimate the current state distribution of a Hidden Markov Model, given the past observations. The problem is well-known to be intractable for most application domains, except in notable cases such as the tabular setting or for linear dynamical systems with gaussian noise. In 71 , we propose a new class of filters based on Gaussian PSD Models, which offer several advantages in terms of density approximation and computational efficiency. We show that filtering can be efficiently performed in closed form when transitions and observations are Gaussian PSD Models. When the transition and observations are approximated by Gaussian PSD Models, we show that our proposed estimator enjoys strong theoretical guarantees, with estimation error that depends on the quality of the approximation and is adaptive to the regularity of the transition probabilities. In particular, we identify regimes in which our proposed filter attains a TV $ε$ -error with memory and computational complexity of $O (ε^{- 1})$ and $O (ε^{- 3 / 2})$ respectively, including the offline learning step, in contrast to the $O (ε^{- 2})$ complexity of sampling methods such as particle filtering.

8.25 Frank-Wolfe meets Shapley-Folkman: a systematic approach for solving nonconvex separable problems with linear constraints

8.26 Naive Feature Selection: A Nearly Tight Convex Relaxation for Sparse Naive Bayes

Due to its linear complexity, naive Bayes classification remains an attractive su- pervised learning method, especially in very large-scale settings. We propose a

sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our bound becomes tight as the marginal contribution of additional features decreases. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature

selection methods such as recursive feature elimination, l1-penalized logistic re- gression and LASSO, while being orders of magnitude faster. For a large data set,

having more than with 1.6 million training points and about 12 million features, and with a non-optimized CPU implementation, our sparse naive Bayes model can be trained in less than 15 seconds.

8.27 Physics-informed kernel learning

Physics-informed machine learning typically integrates physical priors into the learning process by minimizing a loss function that includes both a data-driven term and a partial differential equation (PDE) regularization. In 49, building on the formulation of the problem as a kernel regression task, we use Fourier methods to approximate the associated kernel, and propose a tractable estimator that minimizes the physics-informed risk function. We refer to this approach as physics-informed kernel learning (PIKL). This framework provides theoretical guarantees, enabling the quantification of the physical prior’s impact on convergence speed. We demonstrate the numerical performance of the PIKL estimator through simulations, both in the context of hybrid modeling and in solving PDEs. In particular, we show that PIKL can outperform physics-informed neural networks in terms of both accuracy and computation time. Additionally, we identify cases where PIKL surpasses traditional PDE solvers, particularly in scenarios with noisy boundary conditions.

8.28 Variational Dynamic Programming for Stochastic Optimal Control

In 53, we consider the problem of stochastic optimal control, where the state-feedback control policies take the form of a probability distribution and where a penalty on the entropy is added. By viewing the cost function as a KullbackLeibler (KL) divergence between two joint distributions, we bring the tools from variational inference to bear on our optimal control problem. This allows for deriving a dynamic programming principle, where the value function is defined as a KL divergence again. We then resort to Gaussian distributions to approximate the control policies and apply the theory to control affine nonlinear systems with quadratic costs. This results in closed-form recursive updates, which generalize LQR control and the backward Riccati equation. We illustrate this novel method on the simple problem of stabilizing an inverted pendulum.

8.29 Variational Inference on the Boolean Hypercube with the Quantum Entropy

In 68, we derive variational inference upper-bounds on the log-partition function of pairwise Markov random fields on the Boolean hypercube, based on quantum relaxations of the Kullback-Leibler divergence. We then propose an efficient algorithm to compute these bounds based on primal-dual optimization. An improvement of these bounds through the use of “hierarchies,” similar to sum-of-squares (SoS) hierarchies is proposed, and we present a greedy algorithm to select among these relaxations. We carry extensive numerical experiments and compare with state-of-the-art methods for this inference problem.

8.30 Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences. Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowdsourcing. The platform has been operational for several months, amassing over 240K votes. This paper describes the platform, analyzes the data we have collected so far, and explains the tried-and-true statistical methods we are using for efficient and accurate evaluation and ranking of models. We confirm that the crowdsourced questions are sufficiently diverse and discriminating and that the crowdsourced human votes are in good agreement with those of expert raters. These analyses collectively establish a robust foundation for the credibility of Chatbot Arena. Because of its unique value and openness, Chatbot Arena has emerged as one of the most referenced LLM leaderboards, widely cited by leading LLM developers and companies. Our demo is publicly available at https://chat.lmsys.org.

8.31 Fair Allocation in Dynamic Mechanism Design

We consider a dynamic mechanism design problem where an auctioneer sells an indivisible good to groups of buyers in every round, for a total of T rounds. The auctioneer aims to maximize their discounted overall revenue while adhering to a fairness constraint that guarantees a minimum average allocation for each group. We begin by studying the static case (T=1) and establish that the optimal mechanism involves two types of subsidization: one that increases the overall probability of allocation to all buyers, and another that favors the groups which otherwise have a lower probability of winning the item. We then extend our results to the dynamic case by characterizing a set of recursive functions that determine the optimal allocation and payments in each round. Notably, our results establish that in the dynamic case, the seller, on the one hand, commits to a participation bonus to incentivize truth-telling, and on the other hand, charges an entry fee for every round. Moreover, the optimal allocation once more involves subsidization, which its extent depends on the difference in future utilities for both the seller and buyers when allocating the item to one group versus the others. Finally, we present an approximation scheme to solve the recursive equations and determine an approximately optimal and fair allocation efficiently.

9 Bilateral contracts and grants with industry

9.1 Bilateral grants with industry

Chaire “Marchés et Apprentissage”, portée par Michael Jordan au sein de la Fondation Inria, et lancée en Juillet 2024. En partenariat avec Air Liquide, BNP Paribas Asset Management Europe, EDF, Orange et la SNCF.

Francis Bach: Co-advised PhD student with Meta.

10 Partnerships and cooperations

10.1 International initiatives

10.1.1 Associate Teams in the framework of an Inria International Lab or in the framework of an Inria International Program

FOAM
First-Order Accelerated Methods for Machine Learning.
Duration:
2020 -> 2024
Coordinator:
Cristobal Guzman (crguzmanp@mat.uc.cl)
Partners:
Pontificia Universidad Católica de Chile Santiago (Chili)
Inria contact:
Alexandre d'Aspremont
Summary:
Our main interest is to investigate novel and improved convergence results for first-order iterative methods for saddle-points, variational inequalities and fixed points, under the lens of PEP. Our interest in improving first-order methods is also deeply related with applications in machine learning. Particularly in sparsity-oriented inverse problems, optimization methods are the workhorse for state of the art results. On some of these problems, a set of new hypothesis and theoretical results shows improved complexity bounds for problems with good recovery guarantees and we plan to extend these new performance bounds to the variational framework.

10.2 International research visitors

10.2.1 Visits of international scientists

Inria International Chair

Participants: Laurent El Ghaoui.

Other international visits to the team

Participants: Sebastian Gruber.

10.3 European initiatives

10.3.1 Horizon Europe

DYNASTY

DYNASTY project on cordis.europa.eu

Title:
Dynamics-Aware Theory of Deep Learning
Duration:
From October 1, 2022 to September 30, 2027
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE (INRIA), France
Inria contact:
Umut Simsekli
Coordinator:
Summary:

The recent advances in deep learning (DL) have transformed many scientific domains and have had major impacts on industry and society. Despite their success, DL methods do not obey most of the wisdoms of statistical learning theory, and the vast majority of the current DL techniques mainly stand as poorly understood black-box algorithms.

Even though DL theory has been a very active research field in the past few years, there is a significant gap between the current theory and practice: (i) the current theory often becomes vacuous for models with large number of parameters (which is typical in DL), and (ii) it cannot capture the interaction between data, architecture, training algorithm and its hyper-parameters, which can have drastic effects on the overall performance. Due to this lack of theoretical understanding, designing new DL systems has been dominantly performed by ad-hoc, 'trial-and-error' approaches.

The main objective of this proposal is to develop a mathematically sound and practically relevant theory for DL, which will ultimately serve as the basis of a software library that provides practical tools for DL practitioners. In particular, (i) we will develop error bounds that closely reflect the true empirical performance, by explicitly incorporating the dynamics aspect of training, (ii) we will develop new model selection, training, and compression algorithms with reduced time/memory/storage complexity, by exploiting the developed theory.

To achieve the expected breakthroughs, we will develop a novel theoretical framework, which will enable tight analysis of learning algorithms in the lens of dynamical systems theory. The outcomes will help relieve DL from being a black-box system and avoid the heuristic design process. We will produce comprehensive open-source software tools adapted to all popular DL libraries, and test the developed algorithms on a wide range of real applications arising in computer vision, audio/music/natural language processing.

CASPER

CASPER project on cordis.europa.eu

Title:
Systematic and computer-aided performance certification for numerical optimization
Duration:
From November 1, 2024 to October 31, 2029
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE (INRIA), France
Inria contact:
Adrien Taylor
Coordinator:
Summary:

Numerical optimization is a fundamental tool with a growing impact in many disciplines from science to industry. Many of its successes are due to theoretical advances, which are key to developing trust in numerical algorithms. While trust is non-negotiable in many applications, the complexity level of modern and future problems makes it very hard for theory to keep up with efficient proposals. Arguably worse, while both theory and experimental practice are key to the field, their respective recommendations often conflict with each other and the gap between theory and practice gets embarrassingly large.

The main objective of this proposal is to push forward the theoretical foundations of algorithmic optimization to drastically reduce the gap between fundamental theoretical understanding and practical scenarios. To achieve this, we will develop principled and systematic approaches to algorithmic analyses, as well as computer-aided performance certification tools. Whereas my recent works show that such techniques already allow going far beyond the surprisingly few classical templates for algorithmic analysis, they have currently very limited applicability beyond simple scenarios. We will largely broaden the techniques to develop and study modern algorithms with working guarantees that can (i) scale to unprecedented problem and data sizes, (ii) adapt to common problem structures, and (iii) be deployed on modern massively parallel computing environments. On the way, this project will allow for simplified certification and validation of existing theory, an absolute necessity in this era of massive scientific production.

Outcomes of CASPER will include symbolical and numerical algorithmic certification and development tools, as well as algorithms with unprecedented working guarantees. The tools will be released as open-source libraries and algorithms validated on key benchmarks that include challenging machine learning and robotic tasks.

10.3.2 H2020 projects

REAL

REAL project on cordis.europa.eu

Title:
Reliable and cost-effective large scale machine learning
Duration:
From April 1, 2021 to March 31, 2026
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE (INRIA), France
Inria contact:
Alessandro Rudi
Coordinator:
Summary:
In the last decade, machine learning (ML) has become a fundamental tool with a growing impact in many disciplines, from science to industry. However, nowadays, the scenario is changing: data are exponentially growing compared to the computational resources (post Moore's law era), and ML algorithms are becoming crucial building blocks in complex systems for decision making, engineering, science. Current machine learning is not suitable for the new scenario, both from a theoretical and a practical viewpoint: (a) the lack of cost-effectiveness of the algorithms impacts directly the economic/energetic costs of large scale ML, making it barely affordable by universities or research institutes; (b) the lack of reliability of the predictions affects critically the safety of the systems where ML is employed. To deal with the challenges posed by the new scenario, REAL will lay the foundations of a solid theoretical and algorithmic framework for reliable and cost-effective large scale machine learning on modern computational architectures. In particular, REAL will extend the classical ML framework to provide algorithms with two additional guarantees: (a) the predictions will be reliable, i.e., endowed with explicit bounds on their uncertainty guaranteed by the theory; (b) the algorithms will be cost-effective, i.e., they will be naturally adaptive to the new architectures and will provably achieve the desired reliability and accuracy level, by using minimum possible computational resources. The algorithms resulting from REAL will be released as open-source libraries for distributed and multi-GPU settings, and their effectiveness will be extensively tested on key benchmarks from computer vision, natural language processing, audio processing, and bioinformatics. The methods and the techniques developed in this project will help machine learning to take the next step and become a safe, effective, and fundamental tool in science and engineering for large scale data problems.

NN-OVEROPT

NN-OVEROPT project on cordis.europa.eu

Title:
Neural Network : An Overparametrization Perspective
Duration:
From November 1, 2021 to October 31, 2024
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE (INRIA), France
- THE BOARD OF TRUSTEES OF THE UNIVERSITY OF ILLINOIS (UNIVERSITY OF ILLINOIS), United States
Inria contact:
Francis Bach
Coordinator:
Summary:
In recent times, overparametrized models where the number of model parameters far exceeds the number of training samples available are the methods of choice for learning problems and neural networks are amongst the most popular overparametrized methods used heavily in practice. It has been discovered recently that overparametrization surprisingly improves the optimization landscape of a complex non-convex problem, i.e., the training of neural networks, and also has positive effects on the generalization performance. Despite improved empirical performance of overparametrized models like neural networks, the theoretical understanding of these models is quite limited which hinders the progress of the field in the right direction. Any progress in the understanding of the optimization as well as generalization aspects for theses complex models especially neural networks will lead to big technical advancement in the field of machine learning and artificial intelligence. During the Marie Sklodowska-Curie Actions Individual Fellowship-Global Fellowship (MSCA-IF-GF), I plan to study the optimization problem arising while training overparametrized neural networks and generalization in overparametrized neural networks. The end goal for this project is to provide better theoretical understanding of the optimization landscape while training overparametrized models as a result of which to provide better optimization algorithms for training as well as to study the universal approximation guarantees of overparametrized models. We also aim to study the implicit bias induced by optimization algorithms while training overparametrized complex models. To achieve the objective discussed above, I will be using tools from traditional optimization theory, statistical learning theory, gradient flows, as well as from statistical physics.

11 Dissemination

11.1 Promoting scientific activities

11.1.1 Scientific events: organisation

Member of the organizing committees

Adrien Taylor
- Workshop organizer: robotics: science and systems (Frontiers of optimization for robotics)
- Organizer of 4 sessions of talks (16 invited speakers) at Europt 2024.
Umut Simsekli
- Co-organizer of the Séminaire Parisien de Statistique
Alexandre d'Aspremont
- Co-organizer of the Physiscs of AI workshop at Les Houches.

11.1.2 Scientific events: selection

Member of conference program committees

M. Jordan: Program Committee, International Congress of Mathematicians

Reviewer

Adrien Taylor:
- Reviewer for conference on learning theory (COLT)
- Reviewer for robotics: science and systems (RSS)
Umut Simsekli
- Area chair for NeurIPS 2024
- Area chair for Algorithmic Learning Theory 2025

11.1.3 Journal

Member of the editorial boards

Alexandre d'Aspremont: Section editor for SIAM Journal on the Mathematics of Data Science (SIMODS).

Reviewer - reviewing activities

Adrien Taylor:
- Reviewer for Mathematical Programming Series A
- Reviewer for Mathematical Programming Series B
- Reviewer for SIAM Journal on Optimization (SIOPT)
- Reviewer for Optimization Letters
- Reviewer for Journal of Machine Learning Research (JMLR)
- Reviewer for Journal of the Association for Computing Machinery (JACM)
Umut Simsekli
- Reviewer for Applied Probability Journals
Alessandro Rudi
- Reviewer for the Journal of Machine Learning Research
- Reviewer for Constructive approximation

11.1.4 Invited talks

Adrien Taylor
- Optimization seminar in the DAO team
- (denied [ecological reasons], December 2024) Invited plenary talk at the Opt-ML NeurIPS workshop in Vancouver.
- (denied [ecological reasons], July 2024) Invited speaker at the DOML workshop in Tokyo.
- (denied [ecological reasons], July 2024) Invited talk at International Symposium on Mathematical Programming (ISMP), Montréal.
- (June 2024) Invited talk at European Conference on Operational Research (EURO), Copenhaguen.
- (June 2024) Invited talk at the One World Optimization Seminar in Vienna, Vienna.
- (April 2024) Invited talk at the Applied Algebra and Analysis Seminar, Bremen.
Umut Simsekli
- (April 2024) Invited lecturer at the Isaac Newton Institute (Cambridge University) programme called 'Heavy Tails in Machine Learning'
- (April 2024) Invited talk at University of Oxford
- (November 2024) Invited talk at ENSAE
Alexandre d'Aspremont
- Invited talk, math of data, at Institute for Mathematical Sciences (IMS) in Singapore.
Francis Bach
- Invited talk, Statistical Methods for Post Genomic Data 2024
- Invited seminars, Ecole Polytechnique Fédérale de Lausanne
- Colloquium, Institut Mathématique de Toulouse
- Keynote speaker, International Symposium on Biomedical Imaging
- Invited seminars, University of Warwick
- CIME summer school, Italy
Michael Jordan
- Bruno De Finetti Lecture, ISBA, Venice, Italy, 7/4/24
- Keynote Speaker, joint ISBA-Fusion 2024 Workshop, Venice, Italy, 7/8/24
- Keynote Speaker, North American Economics and AI+ML Meeting of the Econometric Society, Ithaca, NY, 8/14/24
- Keynote Speaker, Conference on the Bund, Shanghai, China, 9/8/24
- Plenary Speaker, Trieste Science Festival, Trieste, Italy, 9/28/24
- Keynote Speaker, AI-ML Systems Conference, Baton Rouge, LA, 10/9/24
- Keynote Speaker, NETGCOOP Conference, Inria Lille, France, 10/11/24
- Keynote Speaker, RECSYS Conference, Bari, Italy, 10/15/24
- Colloquium Speaker, Harvard Statistics, Cambridge, MA, 11/4/24

11.1.5 Scientific expertise

F. Bach: Member of the Scientific Council of the Société Informatique de France, since 2022.

11.2 Teaching - Supervision - Juries

11.2.1 Teaching

Master: Alexandre d'Aspremont, Optimisation convexe: modélisation, algorithmes et applications cours magistraux 21h (2011-Present), Master M2 MVA, ENS PS.
Master : Francis Bach, Learning theory from first principles, 27h, Master M2 MASH, Université Paris Dauphine PSL, France.
Master: Alessandro Rudi, Umut Simsekli. Introduction to Machine Learning, 52h, L3, ENS, Paris.
Master: Alessandro Rudi, Kernel Methods 10h, Master M2 MVA, ENS PS.
Master: Adrien Taylor. Convex Optimization, 21h, M1, ENS, Paris.
Master: Adrien Taylor, Optimisation convexe stochastique (Invited): 3h, Master M2 MVA, ENS PS.
Master: Adrien Taylor, Optimization and deep learning: 12h, Ecole Polytechnique, Palaiseau, France.
Master : Umut Simsekli. Deep Learning, 21h, M2, Ecole Polytechnique, Palaiseau, France
Master : Francis Bach, Learning theory from first principles, 27h, Master M2 IASD, PSL Reseaarch University, France.

11.2.2 Supervision

Adrien Taylor:
- PhD defense: Antoine Bambade
- PhD defense: Céline Moucer (co-advised with Francis Bach)
- PhD defense: Baptiste Goujaud
- new PhD student (starting 10/2024): Roland Andrews
- new PhD student (starting 10/2024): Weijia Wang
- new intern (starting 11/2024): Daniel Berg Thomsen
Umut Simsekli
- PhD Student: Benjamin Dupuis
- PhD Student: Dario Shariatian (with Alain Durmus)
- Postdoc: Fabian Schaipp (with Adrien Taylor and Francis Bach)
- Postdoc: Maxime Haddouche
Alexandre d'Aspremont:
- Benjamin Dubois-Taine, PhD student
- Sarah Brood, PhD student
- Arthur Calvi, PhD student
- Yang Su, Postdoc
- Fajwel Fogel, Postdoc
Alessandro Rudi
- PhD defense: Gaspard Beugnot
- PhD defense: Theophile Cantelobre
Francis Bach
- PhD defense: Bertille Follain
- PhD defense: Marc Lambert
- PhD in progress: Lawrence Stewart
- PhD in progress: Simon Martin
- New PhD student: Juliette Decugis, co-advised with Gabriel Synnaeve and Taco Cohen (Meta)
- New PhD student: Alexandre François, co-advised wûth Antonio Orvieto (Ellis institute, Tübingen)
- New Post-doc: Frederik Kunstner
- New PhD student: Eugène Berta (co-advised with Michael Jordan)
- New PhD student: Sacha Braun (co-advised with Michael Jordan)
Michael Jordan
- New PhD student: Nabil Boukir (co-advised with Francis Bach)
- New PhD student: Etienne Gauthier (co-advised with Francis Bach)
- New Post-doc: Yuron Chen

11.3 Popularization

11.3.1 Participation in Live events

F. Bach: Conférence publique, Ambassade de France, Rome, Italie

12 Scientific production

12.1 Major publications

1 inproceedingsR.Rayna Andreeva, B.Benjamin Dupuis, R.Rik Sarkar, T.Tolga Birdal and U.Umut Şimşekli. Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms.PMLRAdvances in Neural Information Processing SystemsVancouver, Canada2024HAL back to text
2 articleA.Armin Askari, A.Alexandre d'Aspremont and L. E.Laurent El Ghaoui. Approximation Bounds for Sparse Programs.SIAM Journal on Mathematics of Data Science42June 2022, 514-530HAL DOI
3 inproceedingsA.Andrea Bertazzi, D.Dario Shariatian, U.Umut Simsekli, E.Eric Moulines and A.Alain Durmus. Piecewise deterministic generative models.PMLRAdvances in Neural Information Processing SystemsVancouver, Canada2024HAL back to text
4 miscT.Théophile Cantelobre, C.Carlo Ciliberto, B.Benjamin Guedj and A.Alessandro Rudi. Measuring dissimilarity with diffeomorphism invariance.February 2022HAL DOI
5 articleR.-A.Radu-Alexandru Dragomir, A.Adrien Taylor, A.Alexandre d'Aspremont and J.Jérôme Bolte. Optimal Complexity and Certification of Bregman First-Order Methods.Mathematical Programming1941July 2022, 41-83HAL DOI
6 inproceedingsB.Benjamin Dupuis, G.George Deligiannidis and U.Umut Şimşekli. Generalization Bounds using Data-Dependent Fractal Dimensions.Proceedings of Machine Learning ResearchInternational Conference on Machine Learning (ICML 2023)Honolulu, United StatesJuly 2023HAL back to text
7 inproceedingsB.Benjamin Dupuis and U.Umut Şimşekli. Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation.PMLRInternational Conference on Machine LearningVienna, Austria2024HAL back to text
8 articleM.Mert Gurbuzbalaban, Y.Yuanhan Hu, U.Umut Simsekli, K.Kun Yuan and L.Lingjiong Zhu. Heavy-Tail Phenomenon in Decentralized SGD.IISE Transactions2024HAL back to text
9 articleM.Mert Gürbüzbalaban, Y.Yuanhan Hu, U.Umut Şimşekli and L.Lingjiong Zhu. Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize.Transactions on Machine Learning Research Journal2023HAL back to text
10 inproceedingsL.Liam Hodgkinson, U.Umut Şimşekli, R.Rajiv Khanna and M. W.Michael W. Mahoney. Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers.International Conference on Machine LearningBaltimore, United States2022HAL
11 inproceedingsS.Soheil Kolouri, K.Kimia Nadjahi, S.Shahin Shahrampour and U.Umut Simsekli. Generalized Sliced Probability Metrics.ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Singapore, SingaporeIEEEMay 2022, 4513-4517HAL DOI
12 articleT.Thomas Lauvaux, C.Clément Giron, M.Matthieu Mazzolini, A.Alexandre d'Aspremont, R.Riley Duren, D.Daniel Cusworth, D.Drew Shindell and P.Philippe Ciais. Global assessment of oil and gas methane ultra-emitters.Science3756580February 2022, 557-561HAL DOI
13 inproceedingsS. H.Soon Hoe Lim, Y.Yijun Wan and U.Umut Şimşekli. Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent.Advances in Neural Processing SystemsNew Orleans, United States2022HAL
14 unpublishedU.Ulysse Marteau-Ferey, F.Francis Bach and A.Alessandro Rudi. Non-parametric Models for Non-negative Functions.July 2020, working paper or preprintHAL
15 inproceedingsS.Sejun Park, U.Umut Şimşekli and M. A.Murat A. Erdogdu. Generalization Bounds for Stochastic Gradient Descent via Localized -Covers.Advances in Neural Processing SystemsBaltimore, United StatesSeptember 2022HAL
16 proceedingsK. L.Krunoslav Lehman Pavasovic, A.Alain Durmus and U.Umut Simsekli, eds. Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent.Advances in Neural Information Processing SystemsOctober 2023HAL back to text
17 inproceedingsA.Anant Raj, M.Melih Barsbey, M.Mert Gürbüzbalaban, L.Lingjiong Zhu and U.Umut Şimşekli. Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares.Algorithmic Learning TheorySingapore, Singapore2023HAL back to text
18 proceedingsA.Anant Raj, U.Umut Şimşekli and A.Alessandro Rudi, eds. Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models.Advances in Neural Information Processing Systems2023HAL
19 inproceedingsA.Anant Raj, L.Lingjiong Zhu, M.Mert Gürbüzbalaban and U.Umut Şimşekli. Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions.International Conference on Machine LearningHonolulu, United States2023HAL back to text
20 articleV.Vincent Roulet and A.Alexandre D'Aspremont. Sharpness, Restart and Acceleration.SIAM Journal on Optimization301October 2020, 262-289HAL DOI
21 inproceedingsS.Sarah Sachs, T.Tim van Erven, L.Liam Hodgkinson, R.Rajiv Khanna and U.Umut Simsekli. Generalization Guarantees via Algorithm-dependent Rademacher Complexity.Conference on Learning TheoryBangalore (Virtual event), IndiaJuly 2023HAL back to text
22 inproceedingsM.Milad Sefidgaran, A.Amin Gohari, G.Gael Richard and U.Umut Şimşekli. Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms.COLT 2022 - 35th Annual Conference on Learning Theory178Proceedings of Machine Learning ResearchLondon, United KingdomJuly 2022HAL
23 inproceedingsP.Paul Viallard, M.Maxime Haddouche, U.Umut Şimşekli and B.Benjamin Guedj. Learning via Wasserstein-Based High Probability Generalisation Bounds.NeurIPS 2023 - Thirty-seventh Conference on Neural Information Processing SystemsNew Orleans, United StatesJune 2023HAL DOI back to text
24 inproceedingsY.Yijun Wan, M.Melih Barsbey, A.Abdellatif Zaidi and U.Umut Simsekli. Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD.PMLRInternational Conference on Machine LearningVienna, Austria2024HAL back to text
25 miscB.Blake Woodworth, F.Francis Bach and A.Alessandro Rudi. Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares.April 2022HAL DOI
26 proceedingsL.Lingjiong Zhu, M.Mert Gurbuzbalaban, A.Anant Raj and U.Umut Simsekli, eds. Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent.Advances in Neural Information Processing Systems2023HAL back to text

12.2 Publications of the year

International journals

27 articleR.Riccardo Bonalli and A.Alessandro Rudi. Non-Parametric Learning of Stochastic Differential Equations with Fast Rates of Convergence.Foundations of Computational MathematicsMarch 2025HAL back to text
28 articleS.Silvère Bonnabel, M.Marc Lambert and F.Francis Bach. Low-rank plus diagonal approximations for Riccati-like matrix differential equations.SIAM Journal on Matrix Analysis and Applications4532024, 1669-1688HAL DOI
29 articleH.Hubert de Boysson, V.Valérie Devauchelle-Pensec, C.Christian Agard, M.Marc André, B.Boris Bienvenu, B.Bernard Bonnotte, G.Guillermo Carvajal Alegria, O.Olivier Espitia, E.Eric Hachulla, E.Emmanuel Heron, M.Marc Lambert, J.-C.Jean-Christophe Lega, K. H.Kim Heang Ly, A.Arsène Mekinian, J.Jacques Morel, A.Alexis Regent, C.Christophe Richez, L.Laurent Sailler, R.Raphaèle Seror, A.Anne Tournadre, M.Maxime Samson, A.Achille Aouba, J.Jérôme Avouac, B.Bernard Cortet, R.Raphaël Darbon, B.Benoît de Wazieres, P.Philippe Dieude, B.Bruno Fautrel, C.Cédric Gaxatte, J.-E.Jacques-Eric Gottenberg, B.Brigitte Granel, H.Hélène Greigert, S.Sandrine Jousse-Joulin, E.Eric Liozon, J.Julie Magnant, S.Sabine Mainbourg, T.Thierry Martin, T.Tristan Mirault, L.Luc Mouthon, T.Thomas Papo, T.Thao Pham, X.Xavier Puéchal, G.Grégory Pugnet, A.André Ramon, F.Frédéric Roca, C.Claire Roubaud, D.David Saadoun, M.-A.Marie-Antoinette Sevestre, P.Perrine Smets, B.Benjamin Terrier and C.Catherine Vignal. French protocol for the diagnosis and management of giant cell arteritis.La Revue de Médecine Interne461January 2025, 12-31HAL DOI
30 articleB.Benjamin Dubois-Taine and A.Alexandre d'Aspremont. Frank-Wolfe meets Shapley-Folkman: a systematic approach for solving nonconvex separable problems with linear constraints.SIAM Journal on Optimization341June 2024, 163-189HAL
31 articleB.Benjamin Dupuis, P.Paul Viallard, G.George Deligiannidis and U.Umut Şimşekli. Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets.Journal of Machine Learning ResearchDecember 2024HAL
32 articleA.Alexandre D’aspremont, C.Cristóbal Guzmán and C.Clément Lezane. Optimal Algorithms for Stochastic Complementary Composite Minimization.SIAM Journal on Optimization341January 2024, 163-189HAL DOI
33 articleB.Baptiste Goujaud, C.Céline Moucer, F.François Glineur, J.Julien Hendrickx, A.Adrien Taylor and A.Aymeric Dieuleveut. PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python.Mathematical Programming Computation163August 2024, 337-367HAL DOI
34 articleB.Baptiste Goujaud, A.Adrien Taylor and A.Aymeric Dieuleveut. Short Paper - Quadratic minimization: from conjugate gradient to an adaptive Polyak’s momentum method with Polyak step-sizes.Open Journal of Mathematical Optimization5November 2024, 1-10HAL DOI
35 articleS.Shuvomoy das Gupta, R.Robert Freund, X. A.Xu Andy Sun and A.Adrien Taylor. Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses.Mathematical ProgrammingAugust 2024HAL DOI
36 articleM.Mert Gurbuzbalaban, Y.Yuanhan Hu, U.Umut Simsekli, K.Kun Yuan and L.Lingjiong Zhu. Heavy-Tail Phenomenon in Decentralized SGD.IISE Transactions2024HAL DOI
37 articleB.Boris Muzellec, A.Adrien Vacher, F.Francis Bach, F.-X.François-Xavier Vialard and A.Alessandro Rudi. Near-optimal estimation of smooth transport maps with kernel sums-of-squares.SIAM Journal on Mathematics of Data Science2024. In press. HAL
38 articleQ.Quentin Peyle, I. B.Imene Ben Rejeb-Mzah, B.Baptiste Piofret, A.Antoine Benoit, A.Alexandre d'Aspremont and A. E.Adil El Yaalaoui. Hy-TeC: a hybrid vision transformer model for high-resolution and large-scale mapping of canopy height.Remote Sensing of Environment302March 2024, 113945HAL DOI
39 articleC.Corbinian Schlosser and M.Matteo Tacchi. Specialized effective Positivstellensätze for improved convergence rates of the moment-SOS hierarchy.IEEE Control Systems Letters8May 2024, 2319-2324HAL DOI
40 articleM.Manu Upadhyaya, S.Sebastian Banert, A.Adrien Taylor and P.Pontus Giselsson. Automated tight Lyapunov analysis for first-order methods.Mathematical ProgrammingFebruary 2024HAL DOI
41 articleP.Paul Viallard, P.Pascal Germain, A.Amaury Habrard and E.Emilie Morvant. A General Framework for the Practical Disintegration of PAC-Bayesian Bounds.Machine Learning11322024, 519-604HAL DOI
42 articleV.Viktor Zaverkin, D.David Holzmüller, H.Henrik Christiansen, F.Federico Errica, F.Francesco Alesiani, M.Makoto Takamoto, M.Mathias Niepert and J.Johannes Kästner. Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials.npj Computational Materials101April 2024, 83HAL DOI
43 articleH.Houssam Zenati, A.Alberto Bietti, M.Matthieu Martin, E.Eustache Diemert, P.Pierre Gaillard and J.Julien Mairal. Counterfactual Learning of Stochastic Policies with Continuous Actions.Transactions on Machine Learning Research JournalMarch 2025HAL

International peer-reviewed conferences

44 inproceedingsR.Rayna Andreeva, B.Benjamin Dupuis, R.Rik Sarkar, T.Tolga Birdal and U.Umut Şimşekli. Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms.PMLRNeurIPS 2024 - 38th Conference on Advances in Neural Information Processing SystemsVancouver, Canada2024HAL
45 inproceedingsA.Antoine Bambade, F.Fabian Schramm, A.Adrien Taylor and J.Justin Carpentier. Leveraging augmented-Lagrangian techniques for differentiating over infeasible quadratic programs in machine learning.ICLR 2024 - The Twelfth International Conference on Learning RepresentationsVienne, AustriaMay 2024HAL
46 inproceedingsE.Eugene Berta, F.Francis Bach and M. I.Michael I Jordan. Classifier Calibration with ROC-Regularized Isotonic Regression.Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 23827th International Conference on Artificial Intelligence and Statistics (AISTATS)238Valence, SpainMay 2024, 1972-1980HAL
47 inproceedingsA.Andrea Bertazzi, D.Dario Shariatian, U.Umut Şimşekli, E.Eric Moulines and A.Alain Durmus. Piecewise deterministic generative models.PMLRNeurIPS 2024 - 38th Conference on Advances in Neural Information Processing SystemsVancouver, Canada2024HAL
48 inproceedingsC.Clémentine Chazal, A.Anna Korba and F.Francis Bach. Statistical and Geometrical Properties of Regularized Kernel Kullback-Leibler Divergence.Neurips 2024 - 38th Conference on Neural Information Processing SystemsVancouver, CanadaJune 2024HAL
49 inproceedingsN.Nathan Doumèche, F.Francis Bach, G.Gérard Biau and C.Claire Boyer. Physics-informed machine learning as a kernel method.COLT 2024 - Thirty Seventh Conference on Learning Theory247Edmonton, CanadaJuly 2024, 1399-1450HAL back to text
50 inproceedingsB.Benjamin Dupuis and U.Umut Şimşekli. Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation.PMLRICML 2024 - 41st International Conference on Machine LearningVienna, Austria2024HAL
51 inproceedingsF.Fajwel Fogel, Y.Yohann Perron, N.Nikola Besic, L.Laurent Saint-André, A.Agnès Pellissier-Tanon, M.Martin Schwartz, T.Thomas Boudras, I.Ibrahim Fayad, A.Alexandre d'Aspremont, L.Loic Landrieu and P.Philippe Ciais. Open-Canopy: Towards Very High Resolution Forest Monitoring.CVPRNashville (Tenessee), United States2025HAL
52 inproceedingsM.Maxime Haddouche, P.Paul Viallard, U.Umut Şimşekli and B.Benjamin Guedj. A PAC-Bayesian Link Between Generalisation and Flat Minima.ALT 2025 - 36th International Conference on Algorithmic Learning TheoryMilan, Italy2025, 1-31HAL
53 inproceedingsM.Marc Lambert, F.Francis Bach and S.Silvère Bonnabel. Variational Dynamic Programming for Stochastic Optimal Control.2024 Conference on Decision and ControlMilano, ItalyDecember 2024HAL back to text
54 inproceedingsL.Lawrence Stewart, M.Matthew Trager, S.Sujan Kumar Gonugondla and S.Stefano Soatto. The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation.ENLSP-IV 2024 - 4th NeurIPS Efficient Natural Language and Speech Processing WorkshopVancouver, CanadaDecember 2024HAL
55 inproceedingsY.Yijun Wan, M.Melih Barsbey, A.Abdellatif Zaidi and U.Umut Şimşekli. Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD.PMLRICML 2024 - 41st International Conference on Machine LearningVienna, Austria2024HAL

Conferences without proceedings

56 inproceedingsD.David Holzmüller, L.Léo Grinsztajn and I.Ingo Steinwart. Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data.Neural Information Processing SystemsVancouver (BC), Canada2024HAL DOI

Doctoral dissertations and habilitation theses

57 thesisA.Antoine Bambade. Primal-dual proximal augmented lagrangian methods for quadratic programming : theory and implementation.Université Paris sciences et lettresJanuary 2024HAL
58 thesisG.Gaspard Beugnot. Optimization, Generalization and Non-convex Optimization With Kernel Methods.Ecole Normale Superieure de Paris - ENS ParisApril 2024HAL
59 thesisT.Théophile Cantelobre. Contributions to machine learning with structured data : theory and algorithms.Université Paris sciences et lettresOctober 2024HAL
60 thesisB.Bertille Follain. Betting on Sparsity: Leveraging Hidden Linear Features through Regularisation for Supervised Learning.Paris Sciences & LettresNovember 2024HAL
61 thesisB.Baptiste Goujaud. Constructive approaches to worst-case complexity analyses of gradient methods for convex optimization : contributions, new insights, and novel results.Institut Polytechnique de ParisApril 2024HAL
62 thesisM.Marc Lambert. Variational Methods for Inference, Filtering, and Control.Ecole normale supérieure - ENS PARISNovember 2024HAL
63 thesisC.Céline Moucer. Convex optimization for data science : continuous-time Lyapunov analyses, geometry dependent matching pursuit and concentration inequalities.Ecole normale supérieure - ENS ParisOctober 2024HAL
64 thesisA.Adrien Taylor. Towards principled and systematic approaches to the analysis and design of optimization algorithms.Université Paris Sciences & LettresNovember 2024HAL

Reports & preprints

65 miscP.-C.Pierre-Cyril Aubin-Frankowski, Y.Yohann de Castro, A.Axel Parmentier and A.Alessandro Rudi. Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems.July 2024HAL
66 miscE.Eugene Berta, D.David Holzmüller, M. I.Michael I. Jordan and F.Francis Bach. Rethinking Early Stopping: Refine, Then Calibrate.January 2025HAL
67 miscE.Eliot Beyler and F.Francis Bach. Optimal Denoising in Score-Based Generative Models: The Role of Data Regularity.March 2025HAL
68 miscE.Eliot Beyler and F.Francis Bach. Variational Inference on the Boolean Hypercube with the Quantum Entropy.February 2025HAL back to text
69 miscP.Pierre Boudart, A.Alessandro Rudi and P.Pierre Gaillard. Structured Prediction in Online Learning.June 2024HAL back to text
70 miscL.Luc Brogat-Motte, R.Riccardo Bonalli and A.Alessandro Rudi. Learning Controlled Stochastic Differential Equations.November 2024HAL
71 miscT.Théophile Cantelobre, C.Carlo Ciliberto, B.Benjamin Guedj and A.Alessandro Rudi. Closed-form Filtering for Non-linear Systems.February 2024HAL back to text
72 miscL.Léo Dana, F.Francis Bach and L.Loucas Pillaud-Vivien. Convergence of Shallow ReLU Networks on Weakly Interacting Data.February 2025HAL
73 miscN.Nathan Doumèche, F.Francis Bach, É.Éloi Bedek, G.Gérard Biau, C.Claire Boyer and Y.Yannig Goude. Forecasting time series with constraints.February 2025HAL
74 miscN.Nathan Doumèche, F.Francis Bach, G.Gérard Biau and C.Claire Boyer. Physics-informed kernel learning.September 2024HAL
75 miscB.Benjamin Dubois-Taine, R.Roland Akiki and A.Alexandre d'Aspremont. Iteratively Reweighted Least Squares for Phase Unwrapping.January 2024HAL
76 miscB.Benjamin Dubois-Taine and A.Alexandre d'Aspremont. Frank-Wolfe meets Shapley-Folkman: a systematic approach for solving nonconvex separable problems with linear constraints.June 2024HAL
77 miscB.Bertille Follain and F.Francis Bach. Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods.July 2024HAL
78 miscE.Etienne Gauthier, F.Francis Bach and M. I.Michael I Jordan. E-Values Expand the Scope of Conformal Prediction.March 2025HAL
79 miscE.Etienne Gauthier, F.Francis Bach and M. I.Michael I. Jordan. Statistical Collusion by Collectives on Learning Platforms.February 2025HAL
80 miscM.Marc Lambert, F.Francis Bach and S.Silvère Bonnabel. Entropy Regularized Variational Dynamic Programming for Stochastic Optimal Control.March 2025HAL
81 miscC.Céline Moucer, A.Adrien Taylor and F.Francis Bach. Constructive approaches to concentration inequalities with independent random variables.August 2024HAL
82 miscC.Céline Moucer, A. B.Adrien B Taylor and F.Francis Bach. Geometry-dependent matching pursuit: a transition phase for convergence on linear regression and LASSO.March 2024HAL
83 miscD.Daniel Musekamp, M.Marimuthu Kalimuthu, D.David Holzmüller, M.Makoto Takamoto and M.Mathias Niepert. Active Learning for Neural PDE Solvers.2024HAL DOI
84 miscS.Simone Naldi, M.Mohab Safey El Din, A.Adrien Taylor and W.Weijia Wang. Solving generic parametric linear matrix inequalities.March 2025HAL
85 miscL.-T.Le-Tuyet-Nhi Pham, D.Dario Shariatian, A.Antonio Ocello, G.Giovanni Conforti and A.Alain Durmus. Discrete Markov Probabilistic Models.February 2025HAL
86 miscC.Corbinian Schlosser, M.Matteo Tacchi and A.Alexey Lazarev. Convergence rates for the moment-SoS hierarchy.January 2024HAL
87 miscD.Damien Scieur, T.Thomas Kerdreux, D.David Martínez-Rubio, A.Alexandre d'Aspremont and S.Sebastian Pokutta. Strong Convexity of Sets in Riemannian Manifolds.December 2024HAL
88 miscL.Lawrence Stewart, F.Francis Bach and Q.Quentin Berthet. Building Bridges between Regression, Clustering, and Classification.February 2025HAL
89 miscP.Paul Viallard, M.Maxime Haddouche, U.Umut Şimşekli and B.Benjamin Guedj. Tighter Generalisation Bounds via Interpolation.2024HAL DOI

12.3 Cited publications

90 articleR.Riccardo Bonalli and A.Alessandro Rudi. Non-Parametric Learning of Stochastic Differential Equations with Fast Rates of Convergence.Foundations of Computational MathematicsMarch 2025HAL back to text

SIERRA - 2024

SIERRA - 2024

2024Activity reportTeamSIERRA

Keywords

Computer Science and Digital Science

Other Research Topics and Application Domains

1 Team members, visitors, external collaborators

Research Scientists

Post-Doctoral Fellows

PhD Students

Interns and Apprentices

Administrative Assistants

Visiting Scientists

2 Overall objectives

2.1 Statement

3 Research program

4 Application domains

5 Social and environmental responsibility

6 Highlights of the year

6.1 Awards

7 New software, platforms, open data

7.1 New software

7.1.1 PEPit

8 New results

8.1 Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses

8.2 Automated tight Lyapunov analysis for first-order methods

8.3 Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

8.4 Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD

8.5 Piecewise deterministic generative models

8.6 Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms

8.7 Heavy-Tail Phenomenon in Decentralized SGD

8.8 Iteratively Reweighted Least Squares for Phase Unwrapping

8.9 Frank-Wolfe meets Shapley-Folkman: a systematic approach for solving nonconvex separable problems with linear constraints

8.10 Open-Canopy: Towards Very High Resolution Forest Monitoring

8.11 An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems

8.12 Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

8.13 Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

8.14 Learning via Wasserstein-Based High Probability Generalisation Bounds

8.15 Generalization Guarantees via Algorithm-dependent Rademacher Complexity

8.16 Generalization Bounds using Data-Dependent Fractal Dimensions

8.17 Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

8.18 Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

8.19 Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

8.20 An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems

8.21 Vision Transformers, a new approach for high-resolution and large-scale mapping of canopy heights

8.22 Non-Parametric Learning of Stochastic Differential Equations with Fast Rates of Convergence.

8.23 Structured Prediction in Online Learning.

8.24 Closed-form Filtering for Non-linear Systems

8.25 Frank-Wolfe meets Shapley-Folkman: a systematic approach for solving nonconvex separable problems with linear constraints

8.26 Naive Feature Selection: A Nearly Tight Convex Relaxation for Sparse Naive Bayes

8.27 Physics-informed kernel learning

8.28 Variational Dynamic Programming for Stochastic Optimal Control

8.29 Variational Inference on the Boolean Hypercube with the Quantum Entropy

8.30 Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

8.31 Fair Allocation in Dynamic Mechanism Design

9 Bilateral contracts and grants with industry

9.1 Bilateral grants with industry

10 Partnerships and cooperations

10.1 International initiatives

10.1.1 Associate Teams in the framework of an Inria International Lab or in the framework of an Inria International Program

10.2 International research visitors

10.2.1 Visits of international scientists

Inria International Chair

Other international visits to the team

10.3 European initiatives

10.3.1 Horizon Europe

DYNASTY

CASPER

10.3.2 H2020 projects

REAL

NN-OVEROPT

11 Dissemination

11.1 Promoting scientific activities

11.1.1 Scientific events: organisation

Member of the organizing committees

11.1.2 Scientific events: selection

Member of conference program committees

Reviewer

11.1.3 Journal

Member of the editorial boards