Learning Theory from First Principles
Francis Bach

This web page hosts supporting material for the book (matlab code, Python code, and in the future Julia code and solutions to exercises).

See final draft available

Matlab code

Generic helper functions




Chapter 1: Mathematical preliminaries

Figure 1.1 (expectation of maximum of Gaussian random variables)

Chapter 2: Introduction to supervised learning
Figure 2.1 (polynomial regression with increasing orders - predictions)
Figure 2.2 (polynomial regression with increasing orders - errors)

Chapter 3: Linear least-squares regression
Figure 3.1 (polynomial regression with varying number of observations)
Figure 3.2 (convergence rate for polynomial regression)
Figure 3.3 (polynomial ridge regression)

Chapter 4: Empirical risk minimization
Figure 4.1 (convex surrogates)
Figure 4.2 (optimal score functions for Gaussian class-conditional densities)


Chapter 5: Optimization
Figure 5.1 (gradient descent on two least-squares problems)
Figure 5.2 (comparison of step-sizes for SGD for the support vector machine)
Figure 5.4 (comparison of step-sizes for SGD for logistic regression)


Chapter 6: Local averaging methods
Figure 6.2 (regressogram in one dimension)
Figure 6.3 (k-nearest neighbor in one dimension)
Figure 6.4 (Nadaraya-Watson in one dimension)
Figure 6.5 (learning curves for local averaging)
Figure 6.6 (locally linear partitioning estimate)


Chapter 7: Kernel methods
Figure 7.2 (minimum norm interpolator)
Figure 7.3 (comparison of kernels)


Chapter 8: Sparse methods
Figure 8.1 (regularization path)

Figure 8.2 (comparison of estimators) + script_model_selection.m + script_model_selectionROT.m

Chapter 9: Neural networks
Figure 9.1 (global convergence for different numbers of neurons) +

Figure 9.2 (random features - kernels)

Figure 9.3 (neural networks fitting)


Chapter 10: Ensemble learning
Figure 10.1 (bagged 1-nn estimation)
Figure 10.2 (Gaussian random projections)
Figure 10.3 (Boosting)

Chapter 11: Overparameterized models
Figure 11.1 (logistic regression on separable data)
Figures 11.2 and 11.3 (double descent curves, random non-linear features)
Figure 11.4 (double descent, random linear projections)

Chapter 12: Lower bounds

Chapter 13: Online learning and bandits
Figure 13.1 (zero-th order optimisation)
Figure 13.2 (UCB algorithm)

Chapter 14: Probabilistic methods
Figure 14.3 (MMSE vs. MAP)
Figure 14.4 (Discriminative vs. generative learning)

Chapter 15: Structured prediction
Figure 15.1 (robust regression)


Python code

See https://github.com/fbach2000/Learning_Theory_from_First_Principles


Copyright in this Work has been licensed exclusively to The MIT Press, http://mitpress.mit.edu, which will be releasing the final version to the public in 2024. All inquiries regarding rights should be addressed to The MIT Press, Rights and Permissions Department.