This
web page hosts supporting material for the book (matlab
code, Python code, and in the future Julia code and solutions to exercises).
See final draft available here.
Generic helper functions
Chapter 1: Mathematical preliminaries
Figure 1.1
(expectation of maximum of Gaussian random variables)
Chapter 2: Introduction to supervised learning
Figure 2.1 (polynomial regression with increasing orders - predictions)
Figure 2.2 (polynomial regression with increasing orders - errors)
Chapter 3: Linear least-squares regression
Figure 3.1
(polynomial regression with varying number of observations)
Figure 3.2
(convergence rate for polynomial regression)
Figure 3.3
(polynomial ridge regression)
Chapter 4: Empirical risk minimization
Figure 4.1 (convex surrogates)
Figure 4.2 (optimal score functions for Gaussian class-conditional densities)
Chapter 5: Optimization
Figure 5.1
(gradient descent on two least-squares problems)
Figure 5.2 (comparison of step-sizes for SGD
for the support vector machine)
Figure 5.4
(comparison of step-sizes for SGD for logistic regression)
Chapter 6: Local averaging methods
Figure 6.2 (regressogram
in one dimension)
Figure 6.3 (k-nearest neighbor in one
dimension)
Figure 6.4 (Nadaraya-Watson
in one dimension)
Figure 6.5
(learning curves for local averaging)
Figure 6.6
(locally linear partitioning estimate)
Chapter 7: Kernel methods
Figure 7.2 (minimum norm interpolator)
Figure 7.3
(comparison of kernels)
Chapter 8: Sparse methods
Figure 8.1 (regularization
path)
Figure
8.2 (comparison of estimators) + script_model_selection.m + script_model_selectionROT.m
Chapter 9: Neural networks
Figure 9.1
(global convergence for different numbers of neurons) + launch_training_relu_nn.m
Figure 9.2
(random features - kernels)
Figure
9.3 (neural networks fitting)
Chapter
10: Ensemble learning
Figure 10.1 (bagged 1-nn
estimation)
Figure 10.2
(Gaussian random projections)
Figure 10.3 (Boosting)
Chapter
11: Overparameterized models
Figure 11.1 (logistic
regression on separable data)
Figures 11.2 and
11.3 (double descent curves, random non-linear features)
Figure 11.4
(double descent, random linear projections)
Chapter 12: Lower bounds
Chapter 13: Online learning and bandits
Figure 13.1 (zero-th order optimisation)
Figure 13.2 (UCB algorithm)
Chapter 14: Probabilistic methods
Figure 14.3 (MMSE
vs. MAP)
Figure 14.4
(Discriminative vs. generative learning)
Chapter
15: Structured prediction
Figure 15.1 (robust
regression)
See
https://github.com/fbach2000/Learning_Theory_from_First_Principles
Copyright in this Work has been licensed exclusively to The MIT Press,
http://mitpress.mit.edu, which will be releasing the final version to the
public in 2024. All inquiries regarding rights should be addressed to The MIT
Press, Rights and Permissions Department.