Optimisation et Apprentissage Statistique

Francis Bach

INRIA - Ecole Normale Supérieure

Master M2 Mathématiques de l'aléatoire

(Université Paris-Sud), 2eme semestre, 2015/2016

The class is taught in English (all slides and class notes are also in English)

Résumé

De nombreux problèmes d'estimation en apprentissage statistique sont formulés comme des problèmes d'optimisation. Ces formulations ont permis une separation entre l'analyse des performances de l'estimateur et le développement d'algorithmes de résolution du problème. Face aux grands volumes de données, une telle séparation n'est plus efficace et l'analyse doit mêler statistique et optimisation. Dans ce cours, seront présentés de manière unifiée les résultats statistiques classiques sur la M-estimation et l'optimisation convexe, en montrant les liens entre statistique et optimisation. Un accent sera mis sur l'analyse non-asymptotique de l'approximation stochastique et du gradient stochastique.

Références

Transparents (version actuelle)

S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015.

Y. Nesterov. Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, 2004.

Dates et contenu des cours

Les cours ont lieu au département de Mathématiques de l'Université Paris-Sud (Orsay, salle 225-227), le jeudi de 14h à 17h, avec certains cours de remplacement dont les dates seront decidees avec les eleves.

Dates	Sujets du cours	Notes de cours - scribes
Jeudi 18/02 de 14h à 17h	Introduction au probleme generique d'apprentissage Classes de fonctions (convexes, lisses, fortement convexes) Analyse statistique par complexite de Rademacher	Benjamin Goehry, Antoine Havet-Morel lecture1.pdf
Jeudi 25/02 de 14h à 17h	Analyse des algorithmes deterministes d'optimisation (descente de gradient, Newton, etc.)	Guillaume Maillard, Nicolas Brosse lecture2.pdf
Jeudi 17/03 de 14h à 17h	Approximation stochastique classique (analyse asymptotique)	Yichen Liu, Rui Li lecture3.pdf
Jeudi 24/03 de 14h à 17h	Approximation stochastique pour problemes convexes non-lisses	Yaheng Cui, Mounia Hamidouche lecture4.pdf
Jeudi 14/04 de 14h à 17h	Approximation stochastique pour problemes convexes lisses
Lundi 18/04 de 14h à 17h	Minimisation de sommes finies
Jeudi 21/04 de 14h à 17h	Evaluation Orale

Validation

Le cours sera valide par (1) une lecture d'article avec restitution orale et (2) une prise de note sous latex d'un des cours.

In order to obtain a grade for this class, students have to (1) read a research article with an oral presentation an (2) type the scribe notes in latex for one the lectures (two students per lecture).

Given the number of students, the session where students present research articles will be a poster session, where every student will prepare 8-12 slides to (a) present the main ideas of the papers, (b) its relationship to the class, (c) potentially the main elements of proofs, and (d) if applicable a simple simulation showing the benefits of the proposed method.

The slides may be prepared in French or English. Note that in some cases it may be worth selecting a relevant subset of results to present.

There are two target audiences: (a) the lecturer (supposed to be an expert who already read the papers) and (b) other students (supposed to know the material seen in class, but typically not the papers). The poster should be adaptive to these two target audiences.

Here is a list of potential papers (if you have any suggestions, please let me know). Papers will be allocated in a first-come / first-serve basis by sending email to Francis Bach (only one paper per student).

A. Dieuleveut, F. Bach. Non-parametric Stochastic Approximation with Large Step sizes. [pdf] [assigned to Samuel Doucet]

Hongzhou Lin, Julien Mairal, Zaid Harchaoui. A Universal Catalyst for First-Order Optimization. [pdf] [assigned to Rui Chen] [assigned to Rui Li]

M. Schmidt, N. Le Roux, F. Bach. Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization. [pdf] [assigned to Benjamin Goehry]

Shai Shalev-Shwartz, Nathan Srebro, Karthik Sridharan. Fast Rates for Regularized Objectives. [pdf] [assigned to Mounia Hamidouche]

F. Orabona. Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning. [pdf] [assigned to Amani Arfaoui]

Cameron Musco, Christopher Musco. Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition. [pdf] [assigned to Antoine Havet-Morel] [assigned to Mladen Dimovski]

A. Defossez and F. Bach. Averaged least-mean-squares: bias-variance trade-offs and optimal sampling distributions. [pdf] [assigned to Yichen Liu]

G. Lan. An optimal method for stochastic composite optimization. [pdf]

A. Dieuleveut, N. Flammarion, and F. Bach. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression. [pdf] [assigned to Léo Miolane]

A. Defazio, F. Bach, S. Lacoste-Julien. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives. [pdf]

Anatoli Juditsky, Arkadi Nemirovski. First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods. [pdf] [assigned to Yaheng Cui]

Anatoli Juditsky, Arkadi Nemirovski. First Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem's Structure. [pdf] [assigned to Nicolas Brosse]

Y. Nesterov. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems. [pdf] [assigned to Dilek Savas]

S. Lacoste-Julien and M. Jaggi. On the Global Linear Convergence of Frank-Wolfe Optimization Variants. [pdf] [assigned to Sendi Samira]

Elad Hazan and Satyen Kale. Beyond the Regret Minimization Barrier: Optimal Algorithms for Stochastic Strongly-Convex Optimization. [pdf] [assigned to Guillaume Maillard] [assigned to Malo Huard]