Ulysse Marteau-Ferey



Since April 2018, I have been working as a member of the SIERRA Team, which is part of the Computer Science Department of École Normale Supérieure Ulm and is also a joint team between CNRS and INRIA. I started my Ph.D. in September 2019, after having completed my master's thesis in August 2018 along with a one year internship. I work under the supervision of Francis Bach and Alessandro Rudi on stochastic approximation and optimization for high dimensionnal learning problems.

I graduated from École Normale Supérieure de Paris (ENS Ulm), from the mathematics department (DMA) in 2019 and got a master degree from École Normale Supérieure de Paris Saclay in mathematics, vision and machine learning (MVA).


Research interests

My main research interests are convex optimization, statistics and PDEs. More precisely, here is are a selection of research topics I am interested in:

  • Stochastic approximations in Hilbert spaces

  • Kernel methods

  • Optimization (convex and more recently non-convex)


  • U. Marteau-Ferey, F. Bach, A. Rudi. Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses.
    [arXiv, hal, pdf, poster], NeurIPS, 2019. [Show Abstract]

    Abstract: In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regres- sion and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a constant factor which scales only logarithmically with the condition number. In the parametric setting, we obtain an algorithm with the same scaling than regular first-order methods but with an improved behavior, in particular in ill-conditioned problems. Second, in the non-parametric machine learning setting, we provide an explicit algorithm combining the previous scheme with Nyström projection techniques, and prove that it achieves optimal generalization bounds with a time complexity of order O(ndfλ), a memory complexity of order O(df2λ) and no dependence on the condition number, generalizing the results known for least-squares regression. Here n is the number of observations and dfλ is the associated degrees of freedom. In particular, this is the first large-scale algorithm to solve logistic and softmax regressions in the non-parametric setting with large condition numbers and theoretical guarantees.

  • U. Marteau-Ferey, D. Ostrovskii, F. Bach, A. Rudi. Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance.
    [arXiv, hal, pdf, poster, slides, video], COLT, 2019. [Show Abstract]

    Abstract: We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels. In order to go beyond the generic analysis leading to convergence rates of the excess risk as from observations, we assume that the individual losses are self-concordant, that is, their third-order derivatives are bounded by their second-order derivatives. This setting includes least-squares, as well as all generalized linear models such as logistic and softmax regression. For this class of losses, we provide a bias-variance decomposition and show that the assumptions commonly made in least-squares regression, such as the source and capacity conditions, can be adapted to obtain fast non-asymptotic rates of convergence by improving the bias terms, the variance terms or both.