Introduction to Machine Learning (2019 - 2020)


MANDATORY INSCRIPTION


Summary

Statistical machine learning is a growing discipline at the intersection of computer science and applied mathematics (probability / statistics, optimization, etc.) and which increasingly plays an important role in technological innovation.

Unlike a course on traditional statistics, statistical machine learning is particularly focused on the analysis of data in high dimension, as well as the efficiency of algorithms to process the large amount of data encountered in multiple application areas such as image or sound analysis, natural language processing, bioinformatics or finance.

The objective of this class is to present the main theories and algorithms in statistical machine learning. The methods covered will rely amongst others on convex analysis arguments. The practical sessions (more than half of which will be realized with computers) will lead to simple implementations of the algorithms seen in class and with applications to various domains such as computer vision or natural language processing.

Prerequisite: probability theory (notion of random variables, convergence of random variables, conditional expectation), coding skills in python.


General information

This class is part of the Computer science courses taught at ENS in L3 in Spring 2019-2020.

Teachers: Pierre Gaillard and Alessandro Rudi.
Practical sessions: Raphaël Berthier.

The class will last 52 hours (30 hours of class + 22 hours of practical sessions) and can be validated for 12 ECTS.
Final grade: approximately 50% final exam, 50% homework.

Previous years: Fall 2019, Fall 2018, 2017, 2016, 2015, 2014, 2013, 2012


Schedule and lecture notes

Thursday mornings from 8h30 to 12h30 in room UV. Typical session will be a lecture from 8h30 to 10h20, followed by a 20min break and the practical work (PW) from 10h40 to 12h30. Bring your personal laptops in practical sessions! Lecture notes and solutions to practical work and exercises will be updated here on the fly.

Home assignment 1: (Download here). It is due by April 22, 2020. It is to be returned by email to as a pdf report of maximum 3 pages together with the ipython notebook used for the code. The results and the figures must be included into the pdf report but not the code.

Home assignment 2: (Download here). It is due by May 20, 2020. It is to be returned by email to as a pdf report of maximum 3 pages together with the ipython notebook used for the code.


Final exam

Date: Thursday, May 28 from 8:30 to 12:30.
Place: From home
Subject: Download (Do not do question 15)
Videoconference link: Gotomeeting
Exemple of exercises from previous years: Exercises (solutions)

Click here to upload your file


The exam takes the form of a homework assignment from your home in a limited time. You have access to all the resources at your disposal. You need to return your copy before May 28th, 2020 at 12:50 by uploading it here. Do not be late.

The penalty scale is minus one point on the final grade for every 10 minutes delay (-1 point if returned between 12h50 and 13h, -2 points if it is between 13h and 13h10,...).

If you don't succeed the upload (for some reason), let me know on gotomeeting and try again a few minutes later. Otherwise, send it by email to and but only after you have tried the upload.

It should be written in English as a single pdf file of maximum size 40Mb. You can take photos of a handwritten copy with your phone, but the photo must be readable and converted to a single pdf file. Please practice doing this before the exam.

The examination will consist of approximately four parts that will cover various parts of the course. The first will be course questions (without technical proofs) to check that you have understood the global content. The other three exercises will be more technical and independent of each other. The exam will probably be too long for you to finish it. Don't panic. Remember that the exercises are independent of each other and remember to choose the ones you prefer.




Planning


# Date Teacher Title
1 06/02/2020 P. Gaillard
Introduction
2 13/02/2020 P. Gaillard
R. Berthier
Supervised learning and linear regression
TD1 (Data: classificationA_train, classificationA_test, classificationB_train, classificationB_test, classificationC_train, classificationC_test, mnist_digits.mat), solution
20/02/2020 Vacation
3 27/02/2020 P. Gaillard
R. Berthier
Unsupervised Learning
4 05/03/2020
No Class
5 12/03/2020
6 19/03/2020 A. Rudi
R. Berthier
Logistic regression and convex analysis
TD3, solution
7 26/03/2020 A. Rudi
R. Berthier
Convex optimization
TD4, solution to theoretical questions, solution to practical questions
8 02/04/2020 P. Gaillard
R. Berthier
High dimensional statistics (Lasso)
Practical session on SGD: TD5, data, solution
09/04/2020 Vacation
16/04/2020 Vacation
9 23/04/2020 A. Rudi
R. Berthier
Kernels
Exercise sheet, solution
10 30/04/2020 A. Rudi
R. Berthier
Elements of Statistical Machine Learning
Numerical tour of Ridge and Lasso by Gabriel Peyre
11 07/05/2020 A. Rudi
R. Berthier
Local methods
Probabilistic modeling and maximum likelihood estimation, solution to the exercises
12 14/05/2020 A. Rudi
R. Berthier
Neural networks
TP Neural Nets - solution
21/05/2020
Ascension (no class)
13 28/05/2020 P. Gaillard Exam