Introduction to Machine Learning

Francis Bach, Lenaïc Chizat

Mastere M2 ICFP, 2019/2020

The class will be taught in French or English, depending on attendance (all slides and class notes are in English).

Summary

Statistical machine learning is a growing discipline at the intersection of computer science and applied mathematics (probability / statistics, optimization, etc.) and which increasingly plays an important role in many other scientific disciplines.

Unlike a course on traditional statistics, statistical machine learning is particularly focused on the analysis of data in high dimension, as well as the efficiency of algorithms to process the large amount of data encountered in multiple application areas such as image or sound analysis, natural language processing, bioinformatics or finance.

The objective of this class is to present the main theories and algorithms in statistical machine learning, with simple proofs of the most important results. The practical sessions will lead to simple implementations of the algorithms seen in class.

Dates

Classes will be held in the room L367 (third floor, ENS, 24 rue Lhomond), Friday morning from 9am to 12pm when no practical sessions, and to 12.30pm when there are practical sessions. Class notes will be made available. Practical sessions will be held on laptops with Python 3 and Jupyter notebooks (please make sure to install it before January 17, and run this script).

Homeworks
Please send the practical sessions (one jupyter notebook .ipynb with cells containing either text or runnable code) to lenaicfrancisml@gmail.com with the subject [PSn] with n being the number of the practical session (no acknowledgements will be sent back).

Lecturer	Date	Topics	Class notes / code
FB	10 January	Introduction to supervised learning (loss, risk, over-fitting and capacity control + cross-validation, Bayes predictor for classification and regression	lecture1.pdf
LC	17 January	Least-squares regression (all aspects, from linear algebra to statistical guarantees and L2 regularization + practical session) Practical session 1, due February 7, 2020	lecture2.pdf TD1-1.ipynb mnist_digits.mat
FB	24 January	Statistical ML without optimization (learning theory, from finite number of hypothesis to Rademacher / covering numbers)	lecture3.pdf
FB	31 January	Local averaging techniques (K-nearest neighbor, Nadaraya-Watson regression: algorithms + statistical analysis + practical session)	lecture4.pdf TD2-KNN.ipynb
LC	7 February	Empirical risk minimization (logistic regression, loss-based supervised learning, probabilistic interpretation through maximum likelihood)	lecture5.pdf
LC	14 February	Convex optimization (gradient descent + nonsmooth + stochastic versions + practical session (logistic regression))	lecture6.pdf TD3.ipynb
	21 February	Holidays
FB	28 February	Model selection (feature selection, L1 regularization and high-dimensional inference + practical session)	lecture7.pdf TD4.ipynb
FB	6 March	Kernels (positive-definite kernels and reproducing kernel Hilbert spaces)	lecture8.pdf
LC	13 March	Neural networks (from one-hidden layer to deep networks + practical session)	lecture9.pdf TD-NN.ipynb
LC	20 March	Unsupervised learning (K-means and PCA (potentially with kernels) + mixture models (potentially EM) + practical session)
	27 March	Review
	3 April	Exam

Evaluation

Evaluation: practical sessions to finish at home + written in-class exam