The class will be taught in French or English, depending on attendance (all slides and class notes are in English).
Summary
Prerequisites: We will prove results in class so a good knowledge of undergraduate mathematics is important, as well as basic notions in probability. Having followed an introductory class on machine learning is beneficial.
All classes
will be "in real life" at ENS (29, rue d'Ulm), on Friday
between 9am and 12pm, in the room Paul Langevin (1st floor),
except on December 9.
The class
will follow the book in preparation (draft available here,
since it will be updated frequently, please get the latest
version).
Each
student will benefit more from the class is the corresponding
sections are read before class.
Date | Topics | Book chapters Figures to reproduce |
October 7 |
Learning with infinite data (population setting) -Decision theory (loss, risk, optimal predictors) -Decomposition of excess risk into approximation and estimation errors -No free lunch theorems -Basic notions of concentration inequalities (MacDiarmid, Hoeffding, Bernstein) |
Chapter 2 Figures 2.1 and 2.2 |
October 14 |
Linear Least-squares regression -Guarantees in the fixed design settings (simple in closed-form) -Ridge regression: dimension independent bounds -Guarantees in the random design settings -Lower bound of performance |
Chapter 3 Figures 3.1, 3.2 and 3.3 |
October 28 (no class on October 21) |
Empirical risk minimization -Convexification of the risk -Risk decomposition -Estimation error: finite number of hypotheses and covering numbers -Rademacher complexity -Penalized problems |
Chapter 4 |
November 4 | Optimization for machine learning -Gradient descent -Stochastic gradient descent -Generalization bounds through stochastic gradient descent |
Chapter 5 Figures 5.1, 5.2 and 5.3 |
November 18 | Local averaging techniques -Partition estimators -Nadaraya-Watson estimators -K-nearest-neighbors -Universal consistency |
Chapter 6 Figure 6.5 (only for k-nn) |
November 25 | Kernel methods -Kernels and representer theorems -Algorithms -Analysis of well-specified models -Sharp analysis of ridge regression -Universal consistency |
Chapter 7 Figure 7.3 |
December 2 | Model selection -L0 penalty -L1 penalty -High-dimensional estimation |
Chapter 8 Figure 8.2 (only for two dimensions d = 2^8) |
December 9 Salle des Actes (45, rue d'Ulm) |
Neural networks -Single hidden layer neural networks - Estimation error - Approximation properties and universality |
Chapter 9 |
December 16 |
Exam |
Evaluation
One written in-class exam, and (very) simple coding assignments (to illustrate convergence results, to be sent to learning.theory.first.principles@gmail.com). For all classes, the coding assignment is to reproduce the experiments shown in the book draft and send only the figures to the address above (which is only use for this purpose, all other enquiries should go to francis.bach@ens.fr).
New
this year! For
a group of a few volunteers who are good Python coders, the goal will
be to reproduce figures and produce code that will be on the book
website. Matlab code will be provided to make sure the results
are the same. This will replace the coding assignments and will come
with bonus in the final grade. Another group can do the same for the
Julia language.
New this year! The draft book is almost finished, and I am still looking for feedback (typos, unclear parts). Please help! (with some bonus in the final grade).