Francis Bach

This
web page hosts supporting material for the book (matlab
code, Python code, and in the future Julia code and solutions to exercises).

See final draft available here.

**Generic helper functions**

**Chapter 1: Mathematical preliminaries**

Figure 1.1
(expectation of maximum of Gaussian random variables)

**Chapter 2: Introduction to supervised learning**

Figure 2.1 (polynomial regression with increasing orders - predictions)

Figure 2.2 (polynomial regression with increasing orders - errors)

**Chapter 3: Linear least-squares regression**

Figure 3.1
(polynomial regression with varying number of observations)

Figure 3.2
(convergence rate for polynomial regression)

Figure 3.3
(polynomial ridge regression)

**Chapter 4: Empirical risk minimization**

Figure 4.1 (convex surrogates)

Figure 4.2 (optimal score functions for Gaussian class-conditional densities)

**Chapter 5: Optimization**

Figure 5.1
(gradient descent on two least-squares problems)

Figure 5.2 (comparison of step-sizes for SGD
for the support vector machine)

Figure 5.4
(comparison of step-sizes for SGD for logistic regression)

**Chapter 6: Local averaging methods**

Figure 6.2 (regressogram
in one dimension)

Figure 6.3 (k-nearest neighbor in one
dimension)

Figure 6.4 (Nadaraya-Watson
in one dimension)

Figure 6.5
(learning curves for local averaging)

Figure 6.6
(locally linear partitioning estimate)

**Chapter 7: Kernel methods**

Figure 7.2 (minimum norm interpolator)

Figure 7.3
(comparison of kernels)

**Chapter 8: Sparse methods**

Figure 8.1 (regularization
path)

Figure
8.2 (comparison of estimators) + script_model_selection.m + script_model_selectionROT.m

**Chapter 9: Neural networks**

Figure 9.1
(global convergence for different numbers of neurons) + launch_training_relu_nn.m

Figure 9.2
(random features - kernels)

Figure
9.3 (neural networks fitting)

**Chapter
10: Ensemble learning**

Figure 10.1 (bagged 1-nn
estimation)

Figure 10.2
(Gaussian random projections)

Figure 10.3 (Boosting)

**Chapter
11: Overparameterized models**

Figure 11.1 (logistic
regression on separable data)

Figures 11.2 and
11.3 (double descent curves, random non-linear features)

Figure 11.4
(double descent, random linear projections)

**Chapter 12: Lower bounds
Chapter 13: Online learning and bandits
**Figure 13.1 (zero-th order optimisation)

Figure 13.2 (UCB algorithm)

Chapter 14: Probabilistic methods

Figure 14.4 (Discriminative vs. generative learning)

**Chapter
15: Structured prediction
**Figure 15.1 (robust
regression)

See
https://github.com/fbach2000/Learning_Theory_from_First_Principles

Copyright in this Work has been licensed exclusively to The MIT Press,
http://mitpress.mit.edu, which will be releasing the final version to the
public in 2024. All inquiries regarding rights should be addressed to The MIT
Press, Rights and Permissions Department.