The class will be
          taught in French or English, depending on attendance (all slides and
          class notes are  in English).
      
| Dates | Sujets du cours | Notes de cours de l'annee
                2015/2016 | 
| Jeudi 26/01 de 14h à 17h | Introduction au
              probleme generique d'apprentissage Classes de fonctions (convexes, lisses, fortement convexes) Analyse statistique par complexite de Rademacher | lecture1.pdf | 
| Jeudi 02/02 de 14h à
                17h Amphi G3, Bat. 450 | Analyse des algorithmes deterministes d'optimisation (descente de gradient, Newton, etc.) | lecture2.pdf | 
| Jeudi 23/02 de 14h à
                17h Salle B13, Bat. 460 | Approximation stochastique pour problemes convexes non-lisses | lecture4.pdf | 
| Jeudi
                09/03 de 14h à
                17h | Approximation stochastique classique (analyse asymptotique) | lecture3.pdf | 
| Jeudi
                        16/03 de 14h à
                        17h | Approximation stochastique pour problemes convexes
              lisses | |
| Jeudi
                    23/03 de 14h à
                    17h | Minimisation de sommes finies | |
| Jeudi 06/04 de 14h à 17h | Evaluation
              Orale - Room 225-227,
                Building 425 | 
| Homework 1
                (Batch algorithms), due Thursday 23/02 | The file data_orsay_2017.mat
            (matlab format, readable from Matlab and Python) contains training
            data and testing data for a binary classification problem with
            d=100, n=10 000 training observations, and n=100 000 testing
            observations. For the two loss functions (square, logistic), and two
            numbers of training observations (10 000 and 1000), plot training
            and testing errors (on the same plot) as a function of the number of
            iterations of gradient descent with constant step-size. This makes a total of 4 plots (2 loss functions x 2 numbers of observartions), to be put in a single pdf file, with two sentences (max) of comments for each plot. | 
| Homework
                            2 (Nonsmooth stochastic algorithms), due Thursday
                            09/03 | With the same data, and the full training data (10 000
            observations), we consider logistic regression, and three plots for
            stochastic gradient descent, all with 20 passes over the data. For
            all 4 plots, compare averaged testing logistic losses (as a function
            of the number of iterations). Advice: only compute testing losses
            every 100 iterations. (a) No averaging vs. averaging: compare averaged SGD and plain SGD with step-sizes decreasing as 1/sqrt(n). Single plot + one sentence comment. (b) Strongly-convex vs. convex (averaged SGD): consider adding mu/2 times the squared norm to the objective (do not forget to add it to the gradient and the testing objective as well). Compare the step size in 1/(mu n) and the step size in 1/sqrt(n). For mu = 1e-1 and 1e-5. Two plots overall + two sentences of comments. (c) cycling vs. sampling with replacement: compare running SGD with multiple passes on the training data, comparing cycling with sampling with replacement. Single plot + one sentence comment. | 
| Homework
                                                            3 (Smooth stochastic
                                                            algorithms+finite
                                                            data sets), due
                                                            Thursday 06/04 | We consider the same data, for logistic regression and
            least-squares regression with regularization by mu times the squared
            Euclidean norm (with mu=1e-1 and mu=1e-5). Compare
            constant-step-size stochastic gradient (sampling with replacement
            with 50 passes) and SAG. This makes overall: 2 (two lossess) x 2 (two mus) x 2 (two algorithms) = 8 plots |