Introduction to Machine Learning

Francis Bach, Lenaïc Chizat

Mastere M2 ICFP, 2020/2021

The class will be taught in French or English, depending on attendance (all slides and class notes are in English).

Summary

Statistical machine learning is a growing discipline at the intersection of computer science and applied mathematics (probability / statistics, optimization, etc.) and which increasingly plays an important role in many other scientific disciplines.

Unlike a course on traditional statistics, statistical machine learning is particularly focused on the analysis of data in high dimension, as well as the efficiency of algorithms to process the large amount of data encountered in multiple application areas such as image or sound analysis, natural language processing, bioinformatics or finance.

The objective of this class is to present the main theories and algorithms in statistical machine learning, with simple proofs of the most important results. The practical sessions will lead to simple implementations of the algorithms seen in class.

Dates - organisation

Given the sanitary situations, we will use an online "flipped classroom" methodology. For every lecture, sections of the book in preparation (check regularly for latest versions) will be highlighted. Students are expected to study the material *before* Friday morning. The friday morning online session will be divided in three groups (each group with a third of students) and students will have the opportunity to ask questions after the lecturer provides a quick overview of the material. Each student has to ask at least one question. Practical sessions will be done at home.

Homeworks
Please send the practical sessions (one jupyter notebook .ipynb with cells containing either text or runnable code) to lenaicfrancisml@gmail.com with the subject [PSn] with n being the number of the practical session (no acknowledgements will be sent back).

Lecturer	Date	Topics	Book sections
LC	15 January	Introduction to supervised learning (loss, risk, over-fitting and capacity control + cross-validation, Bayes predictor for classification and regression	1.2.1, 1.2.4 2.1, 2.2, 2.3, 2.4
FB	22 January	Least-squares regression (all aspects, from linear algebra to statistical guarantees and L2 regularization + practical session) Practical session 1, due February 12, 2021 (mnist_digits.mat)	3.1, 3.2, 3.3, 3.4, 3.5, 3.6
LC	29 January	Statistical ML without optimization (learning theory, from finite number of hypothesis to Rademacher / covering numbers)	4.1.1, 4.1.2, 4.2, 4.3, 4.4.(1-3), 4.5.(1-4)
FB	5 February	Convex optimization (gradient descent + nonsmooth + stochastic versions) Practical session 2, due March 5, 2021	5.1, 5.2.1, 5.2.2, 5.2.3, 5.3, 5.4 (not 5.4.1 and 5.4.2)
LC	12 February	Local averaging techniques (K-nearest neighbor, Nadaraya-Watson regression: algorithms + statistical analysis) Practical session 3, due March 12, 2021	6 (all sections except the diamond ones)
FB	19 February	Kernels (positive-definite kernels and reproducing kernel Hilbert spaces)	7 (all sections except the diamond ones)
	26 February	Holidays
LC	5 March	Model selection (feature selection, L1 regularization and high-dimensional inference + practical session). Practical session 4, due April 2, 2021	8 (all sections except the diamond ones)
FB	12 March	Neural networks (from one-hidden layer to deep networks) Practical session 5, no need to return it	9 (all sections except the diamond ones)
LC	19 March	Special topics	10
FB	26 March	Review
	2-4 April	Final homework

Evaluation

-Practical sessions to do at home and to be sent to lenaicfrancisml@gmail.com

-Final homework at the end of the class