Learning Theory from First Principles

Francis Bach

Masters M2 IASD 2024/2025

 

Mandatory registration

The class will be taught in French or (most probably) English, depending on attendance (all slides and class notes are in English).


 

Summary 

The goal of this class is to present old and recent results in learning theory, for the most widely-used learning architectures. This class is geared towards theory-oriented students as well as students who want to acquire a basic mathematical understanding of algorithms used throughout the masters program.

 

A particular effort will be made to prove many results from first principles, while keeping the exposition as simple as possible. This will naturally lead to a choice of key results that show-case in simple but relevant instances the important concepts in learning theory. Some general results will also be presented without proofs.

 

The class will be organized in nine three-hour sessions, each with a precise topic (a chapter from the book in preparation "Learning theory from first principles"). See tentative schedule below. Credit: 4 ECTS.

 

Prerequisites: We will prove results in class so a good knowledge of undergraduate mathematics is important, as well as basic notions in probability. Having followed an introductory class on machine learning is beneficial.

 



Dates

All classes will be "in real life" at the auditorium of Parisanté Campus, on Thursdays between 9am and 12.15pm.

The class will follow the book (final draft available here)

Each student will benefit more from the class if the corresponding sections are read before class.
 

Date

Topics

Book chapters
Exercises / Boards

September 19

Learning with infinite data (population setting)
-Decision theory (loss, risk, optimal predictors)
-Decomposition of excess risk into approximation and estimation errors
-No free lunch theorems
-Basic notions of concentration inequalities (MacDiarmid, Hoeffding, Bernstein)

Chapter 2

Exercises 2.1, 2.2


October 3

Linear Least-squares regression
-Guarantees in the fixed design settings (simple in closed-form)
-Ridge regression: dimension independent bounds
-Guarantees in the random design settings
-Lower bound of performance

Chapter 3
Board
Exercises 3.2, 3.6

October 10

Empirical risk minimization
-Convexification of the risk
-Risk decomposition
-Estimation error: finite number of hypotheses and covering numbers
-Rademacher complexity
-Penalized problems

Chapter 4
Board
Exercises 4.9, 4.12

October 17

Optimization for machine learning
-Gradient descent
-Stochastic gradient descent
-Generalization bounds through stochastic gradient descent

Chapter 5
Board
Exercises 5.1, 5.9

October 24

Local averaging techniques
-Partition estimators
-Nadaraya-Watson estimators
-K-nearest-neighbors

Chapter 6
Board
Exercises 6.1, 6.3

November 7

Kernel methods
-Kernels and representer theorems
-Algorithms
-Universal consistency

Chapter 7
Board
Exercises 7.2, 7.11

November 14

Model selection
-L0 penalty
-L1 penalty
-High-dimensional estimation

Chapter 8
Board
Exercises 8.2, 8.6

November 21

Neural networks
-Single hidden layer neural networks
-Estimation error
-Approximation properties and universality

Chapter 9
Board
Exercise 9.1

December 5

Overparameterized models

Chapter 12

December 18

Exam

 


Evaluation

Take-home exercises, one per class to be sent before the next class (25%). One written in-class exam (75%).
Extra points for writing in latex solutions to exercises from the textbook.

 

Procedure for sending exercise solutions: Send your PDF file to fbachorsay2017@gmail.com, before the end of the next class, always with *exactly* the same subject: homework. Do not forget to add your name to the pdf.

 

Solutions to the exercises will be posted here one week after they are due.