The class will be taught
in French or English, depending on attendance (all slides and class
notes are in English).
Dates | Sujets du cours | Notes de cours de l'annee 2015/2016 |
Jeudi 24/01 de 14h à 17h | Introduction au
probleme generique d'apprentissage Classes de fonctions (convexes, lisses, fortement convexes) Analyse statistique par complexite de Rademacher |
lecture1.pdf |
Jeudi 31/01 de 14h à
17h |
Analyse des algorithmes deterministes d'optimisation (descente de gradient, Newton, etc.) | lecture2.pdf |
Jeudi 14/02 de 14h à
17h |
Approximation stochastique pour problemes convexes non-lisses | lecture4.pdf |
Jeudi 28/02
de 14h à
17h |
Approximation stochastique classique (analyse asymptotique) | lecture3.pdf |
Jeudi
14/03 de 14h à
17h |
Approximation stochastique pour problemes convexes
lisses |
|
Jeudi
21/03 de 14h à
17h |
Minimisation de sommes finies | |
Jeudi 04/04 de 14h à 17h | Examen ecrit |
Homework 1
(Batch algorithms), due Thursday 14/02 |
The file data_orsay_2017.mat
(matlab format, readable from Matlab and Python) contains training
data and testing data for a binary classification problem with
d=100, n=10 000 training observations, and n=100 000 testing
observations. For the two loss functions (square, logistic), and two
numbers of training observations (10 000 and 1000), plot training
and testing errors (on the same plot) as a function of the number of
iterations of gradient descent with constant step-size. This makes a total of 4 plots (2 loss functions x 2 numbers of observartions), to be put in a single pdf file, with two sentences (max) of comments for each plot. |
Homework
2 (Nonsmooth stochastic algorithms), due Thursday
14/03 |
With the same data, and the full training data (10 000
observations), we consider logistic regression, and three plots for
stochastic gradient descent, all with 20 passes over the data. For
all 4 plots, compare averaged testing logistic losses (as a function
of the number of iterations). Advice: only compute testing losses
every 100 iterations. (a) No averaging vs. averaging: compare averaged SGD and plain SGD with step-sizes decreasing as 1/sqrt(n). Single plot + one sentence comment. (b) Strongly-convex vs. convex (averaged SGD): consider adding mu/2 times the squared norm to the objective (do not forget to add it to the gradient and the testing objective as well). Compare the step size in 1/(mu n) and the step size in 1/sqrt(n). For mu = 1e-1 and 1e-5. Two plots overall + two sentences of comments. (c) cycling vs. sampling with replacement: compare running SGD with multiple passes on the training data, comparing cycling with sampling with replacement. Single plot + one sentence comment. |
Homework
3 (Smooth stochastic
algorithms+finite
data sets), due
Thursday 04/04 |
We consider the same data, for logistic regression and
least-squares regression with regularization by mu times the squared
Euclidean norm (with mu=1e-1 and mu=1e-5). Compare
constant-step-size stochastic gradient (sampling with replacement
with 50 passes) and SAG. This makes overall: 2 (two lossess) x 2 (two mus) x 2 (two algorithms) = 8 plots |