{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# K nearest neighbors and cross-validation\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"For this practical session, we will work on the real data mnist_digits.mat (digits), that can be downloaded from the course web page.\n",
"\n",
"For classification problems with $K$ classes, we call the \"confusion matrix\" associated to data $D_n=(x_t,y_t)$ the matrix $M \\in \\mathbb{N}^{K \\times K}$ such that $M_{i,j}$ is the number of elements with true class $i$ and predicted class $j$.\n",
"\n",
"**NB**: Given that there are more than $66000$ images in the dataset, we only work on a subset of these $66000$ images so as to not go beyond the memory of your computer.\n",
"\n",
"**1) Start by getting acquainted with the data. They are composed of a vector of labels `y` and images of size 28x28, given in matrix `x` of linearized vectors (each line of the matrix `x` corresponds to a single image).**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"