The class will be taught
in French or English, depending on attendance (all slides and class
notes are in English).
Dates | Sujets du cours | Notes de cours de l'annee 2015/2016 |
Lundi 24/01 de 14h à 17h Francis Bach |
Introduction au
probleme generique d'apprentissage Classes de fonctions (convexes, lisses, fortement convexes) Analyse statistique par complexite de Rademacher |
lecture1.pdf Lien gotomeeting |
Lundi 31/01 de 14h à
17h Francis Bach |
Analyse des algorithmes deterministes d'optimisation (descente de gradient, Newton, etc.) | lecture2.pdf Lien gotomeeting |
Lundi 07/02 de 14h à
17h Aymeric Dieuleveut |
Approximation stochastique pour problemes convexes non-lisses | lecture4.pdf |
Lundi 14/02
de 14h à
17h Aymeric Dieuleveut |
Approximation stochastique classique (analyse asymptotique) | lecture3.pdf |
Lundi
28/02 de 14h à
17h Aymeric Dieuleveut |
Approximation stochastique pour problemes convexes
lisses |
|
Lundi
07/03 de 14h à
17h Francis Bach |
Minimisation de sommes finies | Lien gotomeeting |
Lundi 28/03 de 14h à 17h | Examen ecrit |
Homework 1
(Batch algorithms) |
The file data_orsay_2017.mat
(matlab format, readable from Matlab and Python) contains training
data and testing data for a binary classification problem with
d=100, n=10 000 training observations, and n=100 000 testing
observations. For the two loss functions (square, logistic), and two
numbers of training observations (10 000 and 1000), plot training
and testing errors (on the same plot) as a function of the number of
iterations of gradient descent with constant step-size. This makes a total of 4 plots (2 loss functions x 2 numbers of observartions), to be put in a single pdf file, with two sentences (max) of comments for each plot. |
Homework 2 (Nonsmooth stochastic algorithms) | With the same data, and the full training data (10 000
observations), we consider logistic regression, and three plots for
stochastic gradient descent, all with 20 passes over the data. For
all 4 plots, compare averaged testing logistic losses (as a function
of the number of iterations). Advice: only compute testing losses
every 100 iterations. (a) No averaging vs. averaging: compare averaged SGD and plain SGD with step-sizes decreasing as 1/sqrt(n). Single plot + one sentence comment. (b) Strongly-convex vs. convex (averaged SGD): consider adding mu/2 times the squared norm to the objective (do not forget to add it to the gradient and the testing objective as well). Compare the step size in 1/(mu n) and the step size in 1/sqrt(n). For mu = 1e-1 and 1e-5. Two plots overall + two sentences of comments. (c) cycling vs. sampling with replacement: compare running SGD with multiple passes on the training data, comparing cycling with sampling with replacement. Single plot + one sentence comment. |
Homework 3 (Smooth stochastic algorithms+finite data sets) | We consider the same data, for logistic regression and
least-squares regression with regularization by mu times the squared
Euclidean norm (with mu=1e-1 and mu=1e-5). Compare
constant-step-size stochastic gradient (sampling with replacement
with 50 passes) and SAG. This makes overall: 2 (two lossess) x 2 (two mus) x 2 (two algorithms) = 8 plots |