Loucas Pillaud-Vivien



Since September 2020 I am a postdoc in the Theory of Machine Learning group of Nicolas Flammarion at EPFL. I did my Ph.D. in the SIERRA Team, which is part of the Computer Science Department of École Normale Supérieure Ulm and is also a joint team between CNRS and INRIA. I worked under the supervision of Francis Bach and Alessandro Rudi on stochastic approximation for high dimensionnal learning problems.

Prior to that, I graduated from École Polytechnique in 2016 and got a master degree from Paris VI and École des Ponts Paris Tech in PDEs (ANEDP). I did my master thesis in Molecular dynamics in the CERMICS, where I was advised by Julien Reygner and Tony Lelievre.


Research interests

My main research interests are convex optimization, statistics and PDEs. More precisely, here is are a selection of research topics I am interested in:

  • Stochastic approximations in Hilbert spaces

  • Kernel methods

  • Stochastic Differential Equations (and PDEs) and how they can model machine learning problems

  • Interacting particle systems


  • A. Varre, L. Pillaud-Vivien, N. Flammarion.
    Last iterate convergence of SGD for Least-Squares in the Interpolation regime. [arxiv:2102.03183, pdf], Preprint, 2020. [Show Abstract]

    Abstract: Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs ⟨θ∗,ϕ(X)⟩=Y, where ϕ(X) stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is two fold: (i) from a (stochastic) optimization perspective, we exhibit an archetypal problem where we can show explicitly the convergence of SGD final iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) from a statistical perspective, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a fine-grained parameterization of the problem to exhibit polynomial rates that can be faster than O(1/T). The link with reproducing kernel Hilbert spaces is established.

  • L. Pillaud-Vivien.
    Learning with Reproducing Kernel Hilbert Spaces: Stochastic Gradient Descent and Laplacian Estimation. [pdf], Thesis manuscript, 2020.

  • L. Pillaud-Vivien, F. Bach, T. Lelievre, A. Rudi, G. Stoltz.
    Statistical Estimation of the Poincaré constant and Application to Sampling Multimodal Distributions. [arxiv:1910.14564, pdf], accepted in Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. [Show Abstract]

    Abstract: Poincaré inequalities are ubiquitous in probability and analysis and have various applications in statistics (concentration of measure, rate of convergence of Markov chains). The Poincaré constant, for which the inequality is tight, is related to the typical convergence rate of diffusions to their equilibrium measure. In this paper, we show both theoretically and experimentally that, given sufficiently many samples of a measure, we can estimate its Poincaré constant. As a by-product of the estimation of the Poincaré constant, we derive an algorithm that captures a low dimensional representation of the data by finding directions which are difficult to sample. These directions are of crucial importance for sampling or in fields like molecular dynamics, where they are called reaction coordinates. Their knowledge can leverage, with a simple conditioning step, computational bottlenecks by using importance sampling techniques.

  • T. Lelievre, L. Pillaud-Vivien, J. Reygner.
    Central Limit Theorem for stationary Fleming-Viot particle systems in finite spaces. [arXiv:1806.04490, pdf], ALEA Latin American Journal of Probability and Mathematical Statistics, 2018. [Show Abstract]

    Abstract: We consider the Fleming-Viot particle system associated with a continuous-time Markov chain in a finite space. Assuming irreducibility, it is known that the particle system possesses a unique stationary distribution, under which its empirical measure converges to the quasistationary distribution of the Markov chain. We complement this Law of Large Numbers with a Central Limit Theorem. Our proof essentially relies on elementary computations on the infinitesimal generator of the Fleming-Viot particle system, and involves the so-called π-return process in the expression of the asymptotic variance. Our work can be seen as an infinite-time version, in the setting of finite space Markov chains, of recent results by Cérou, Delyon, Guyader and Rousset [arXiv:1611.00515, arXiv:1709.06771].

  • L. Pillaud-Vivien, A. Rudi, F. Bach.
    Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes. [arXiv:1805.10074, pdf, poster], Advances in Neural Information Processing Systems (NeurIPS), 2018. [Show Abstract]

    Abstract: We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on unseen data, the existing theoretical analysis of SGD suggests that a single pass is statistically optimal. While this is true for low-dimensional easy problems, we show that for hard problems, multiple passes lead to statistically optimal predictions while single pass does not; we also show that in these hard models, the optimal number of passes over the data increases with sample size. In order to define the notion of hardness and show that our predictive performances are optimal, we consider potentially infinite-dimensional models and notions typically associated to kernel methods, namely, the decay of eigenvalues of the covariance matrix of the features and the complexity of the optimal predictor as measured through the covariance matrix. We illustrate our results on synthetic experiments with non-linear kernel methods and on a classical benchmark with a linear model.

  • L. Pillaud-Vivien, A. Rudi, F. Bach.
    Exponential convergence of testing error for stochastic gradient methods. [arXiv:1712.04755, pdf, video, poster], Proceedings of the International Conference on Learning Theory (COLT), 2018. [Show Abstract]

    Abstract: We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods. We show that while the excess testing loss (squared loss) converges slowly to zero as the number of observations (and thus iterations) goes to infinity, the testing error (classification error) converges exponentially fast if low-noise conditions are assumed.

PhD Thesis

I will defend my thesis on Friday, October 30, at 3 pm, at INRIA.

You can download the final version of the manuscript via this link [Thesis].

You can also have a look at the slides. (Available soon !)

Some Presentations

  • Two results on Stochastic gradient descent in Hilbert spaces for Machine Learning problems
    [Slides]. Cermics seminar, Ecole des Ponts. November 2019.

  • Statistical Estimation of the Poincaré constant and Application to Sampling Multimodal Distributions.
    [Slides]. Sierra group seminar, INRIA. April 2019.

  • Statistical Optimality of Stochastic Gradient Descent through Multiple Passes.
    [Slides, Poster]. Optimization and Statistical Learning, Workshop in Les Houches. March 2019.

  • Langevin dynamics and applications to Machine Learning.
    [Slides]. Sierra group seminar, INRIA. February 2019.

  • Comparing Dynamics: Deep Neural Networks versus Glassy systems.
    [Slides]. Statistical Machine Learning in Paris (SMILE seminar). December 2018.

  • Statistical Optimality of Stochastic Gradient Descent through Multiple Passes.
    [Slides, Poster]. Advances in Neural Information Processing Systems (NeurIPS). December 2018.

  • Exponential convergence of testing error for stochastic gradient methods.
    [Slides, Video, Poster]. International Conference on Learning Theory (COLT). July 2018.


Reviewer for Journals:

  • Journal of Machine Learning Research

  • IoP Science: Machine Learning: Science and Technology

Reviewer for Conferences:

  • International Conference on Learning Representations (ICLR 2021)

  • Advances in Neural Information Processing Systems (NeurIPS 2019-20-21)

  • International Conference on Machine Learning (ICML 2020-21)