Ulysse MarteauFerey

Briefly
Since April 2018, I have been working as a member of the SIERRA Team, which is part of the Computer Science Department of École Normale Supérieure Ulm and is also a joint team between CNRS and INRIA. I started my Ph.D. in September 2019, after having completed my master's thesis in August 2018 along with a one year internship. I work under the supervision of Francis Bach and Alessandro Rudi on stochastic approximation and optimization for high dimensionnal learning problems.
I graduated from École Normale Supérieure de Paris (ENS Ulm), from the mathematics department (DMA) in 2019 and got a master degree from École Normale Supérieure de Paris Saclay in mathematics, vision and machine learning (MVA).
From September 2020 to January 2021, I was an intern at Capital Fund Management (CFM), working on exploratory Natural Language Processing applications.
Contact

Research interests
My main research interests are convex optimization, statistics and PDEs. More precisely, here is are a selection of research topics I am interested in:
Publications
A. Rudi, U. MarteauFerey, F. Bach. Finding Global Minima via Kernel Approximations. [arXiv, hal, pdf], arxiv preprint, 2020. [Show Abstract]
Abstract: We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential runningtime complexity. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum. This is done by using infinite sums of square smooth functions and has strong links with polynomial sumofsquares hierarchies. Leveraging recent representation properties of reproducing kernel Hilbert spaces, the infinitedimensional optimization problem can be solved by subsampling in time polynomial in the number of function evaluations, and with theoretical guarantees on the obtained minimum.
Given n samples, the computational cost is O(n3.5) in time, O(n2) in space, and we achieve a convergence rate to the global optimum that is O(n−md123/d) where m is the degree of differentiability of the function and d the number of dimensions. The rate is nearly optimal in the case of Sobolev functions and more generally makes the proposed method particularly suitable for functions that have a large number of derivatives. Indeed, when m is in the order of d, the convergence rate to the global optimum does not suffer from the curse of dimensionality, which affects only the worstcase constants (that we track explicitly through the paper).
U. MarteauFerey, F. Bach, A. Rudi. Nonparametric Models for Nonnegative Functions. [arXiv, hal, pdf, poster, short slides, long slides], NeurIPS spotlight, 2020. [Show Abstract]
Abstract: Linear models have shown great effectiveness and flexibility in many fields such as machine learning, signal processing and statistics. They can represent rich spaces of functions while preserving the convexity of the optimization problems where they are used, and are simple to evaluate, differentiate and integrate. However, for modeling nonnegative functions, which are crucial for unsupervised learning, density estimation, or nonparametric Bayesian methods, linear models are not applicable directly. Moreover, current stateoftheart models like generalized linear models either lead to nonconvex optimization problems, or cannot be easily integrated. In this paper we provide the first model for nonnegative functions which benefits from the same good properties of linear models. In particular, we prove that it admits a representer theorem and provide an efficient dual formulation for convex problems. We study its representation power, showing that the resulting space of functions is strictly richer than that of generalized linear models. Finally we extend the model and the theoretical results to functions with outputs in convex cones. The paper is complemented by an experimental evaluation of the model showing its effectiveness in terms of formulation, algorithmic derivation and practical results on the problems of density estimation, regression with heteroscedastic errors, and multiple quantile regression.
U. MarteauFerey, F. Bach, A. Rudi. Globally Convergent Newton Methods for Illconditioned Generalized Selfconcordant Losses. [arXiv, hal, pdf, poster], NeurIPS, 2019. [Show Abstract]
Abstract: In this paper, we study largescale convex optimization algorithms based on the Newton method applied to regularized generalized selfconcordant losses, which include logistic regres sion and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a constant factor which scales only logarithmically with the condition number. In the parametric setting, we obtain an algorithm with the same scaling than regular firstorder methods but with an improved behavior, in particular in illconditioned problems. Second, in the nonparametric machine learning setting, we provide an explicit algorithm combining the previous scheme with Nyström projection techniques, and prove that it achieves optimal generalization bounds with a time complexity of order O(ndfλ), a memory complexity of order O(df2λ) and no dependence on the condition number, generalizing the results known for leastsquares regression. Here n is the number of observations and dfλ is the associated degrees of freedom. In particular, this is the first largescale algorithm to solve logistic and softmax regressions in the nonparametric setting with large condition numbers and theoretical guarantees.
U. MarteauFerey, D. Ostrovskii, F. Bach, A. Rudi. Beyond LeastSquares: Fast Rates for Regularized Empirical Risk Minimization through SelfConcordance. [arXiv, hal, pdf, poster, slides, video], COLT, 2019. [Show Abstract]
Abstract: We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and nonlinear predictors through positivedefinite kernels. In order to go beyond the generic analysis leading to convergence rates of the excess risk as from observations, we assume that the individual losses are selfconcordant, that is, their thirdorder derivatives are bounded by their secondorder derivatives. This setting includes leastsquares, as well as all generalized linear models such as logistic and softmax regression. For this class of losses, we provide a biasvariance decomposition and show that the assumptions commonly made in leastsquares regression, such as the source and capacity conditions, can be adapted to obtain fast nonasymptotic rates of convergence by improving the bias terms, the variance terms or both.
