The class will be taught
          in French or English, depending on attendance (all slides and class
          notes are in English).
      
| Dates | Sujets du cours | Notes de cours de l'annee 2015/2016 | 
        
| Jeudi 01/02 de 14h à 17h | Introduction au
              probleme generique d'apprentissage Classes de fonctions (convexes, lisses, fortement convexes) Analyse statistique par complexite de Rademacher  | 
            lecture1.pdf  | 
        
| Jeudi 08/02 de 14h à
                17h | 
          Analyse des algorithmes deterministes d'optimisation (descente de gradient, Newton, etc.) | lecture2.pdf | 
        
| Jeudi 15/02 de 14h à
                17h | 
          Approximation stochastique pour problemes convexes non-lisses | lecture4.pdf | 
| Jeudi 22/03
                de 14h à
                17h | 
          Approximation stochastique classique (analyse asymptotique) | lecture3.pdf | 
| Jeudi
                        29/03 de 14h à
                        17h  | 
          Approximation stochastique pour problemes convexes
              lisses | 
          |
| Jeudi
                    05/04 de 14h à
                    17h | 
          Minimisation de sommes finies | |
| 
               | 
          
| Homework 1
                (Batch algorithms), due Thursday 29/02 | 
          The file data_orsay_2017.mat
            (matlab format, readable from Matlab and Python) contains training
            data and testing data for a binary classification problem with
            d=100, n=10 000 training observations, and n=100 000 testing
            observations. For the two loss functions (square, logistic), and two
            numbers of training observations (10 000 and 1000), plot training
            and testing errors (on the same plot) as a function of the number of
            iterations of gradient descent with constant step-size. This makes a total of 4 plots (2 loss functions x 2 numbers of observartions), to be put in a single pdf file, with two sentences (max) of comments for each plot.  | 
        
| Homework
                            2 (Nonsmooth stochastic algorithms), due Thursday
                            22/03 | 
          With the same data, and the full training data (10 000
            observations), we consider logistic regression, and three plots for
            stochastic gradient descent, all with 20 passes over the data. For
            all 4 plots, compare averaged testing logistic losses (as a function
            of the number of iterations). Advice: only compute testing losses
            every 100 iterations. (a) No averaging vs. averaging: compare averaged SGD and plain SGD with step-sizes decreasing as 1/sqrt(n). Single plot + one sentence comment. (b) Strongly-convex vs. convex (averaged SGD): consider adding mu/2 times the squared norm to the objective (do not forget to add it to the gradient and the testing objective as well). Compare the step size in 1/(mu n) and the step size in 1/sqrt(n). For mu = 1e-1 and 1e-5. Two plots overall + two sentences of comments. (c) cycling vs. sampling with replacement: compare running SGD with multiple passes on the training data, comparing cycling with sampling with replacement. Single plot + one sentence comment.  | 
        
| Homework
                                                            3 (Smooth stochastic
                                                            algorithms+finite
                                                            data sets), due
                                                            Thursday 12/04 | 
          We consider the same data, for logistic regression and
            least-squares regression with regularization by mu times the squared
            Euclidean norm (with mu=1e-1 and mu=1e-5). Compare
            constant-step-size stochastic gradient (sampling with replacement
            with 50 passes) and SAG. This makes overall: 2 (two lossess) x 2 (two mus) x 2 (two algorithms) = 8 plots  |