{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TD3: Gradient descent for logistic regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this practical session, we will implement the gradient descent (GD) and stochastic gradient descent (SGD) to solve a simple classification problem."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We consider synthetic training data $(X,Y)$ distributed as follows: the label $Y$ takes the values $-1$ and $1$ with equal probability and $X$ is $d$-dimensional, such that $P(X|Y=-1)$ is uniform in $[0,1]^d$ and $P(X|Y=1)$ is the same but translated by the vector $v=(1,0,\\dots,0) \\in \\mathbb{R}^d$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. In dimension $d=100$, generate an i.i.d. training set of size $n=20$ and store it. Generate a test set of size $n=100$ and store it. Make a 2d plot of the training set representing the two first coordinates, using different colors for the two classes. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. What is the Bayes classifier for the 0-1 loss for this distribution? What is its risk for the 0-1 loss?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. GD for logistic regression with $\\ell_2$-regularization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We consider the following regularized empirical risk objective, for $w\\in \\mathbb{R}^d$,\n",
"$$\n",
"F(w) = \\frac1n \\sum_{i=1}^n \\log(1+\\exp(-y_i z_i^\\top w)) + \\lambda \\sum w_j^2.\n",
"$$\n",
"where $z_i=(1,x_i)$ (as usual, to allow for affine classifier instead of just linear)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Implement gradient descent with a fixed step-size to solve this problem, initializing with $w_0=0$. Make a plot showing the evolution of the training loss and the test error (i.e. the 0-1 loss). Make these plots for $\\lambda=1$, $\\lambda=0.1$ and $\\lambda=0.01$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4. Write short comments on:\n",
"(i) your choice of step-size,\n",
"(ii) the evolution of the convergence speed vs lambda, and \n",
"(iii) the evolution of the end-of-training test error vs lambda (does regularization improves generalization here?)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. GD for logistic regression with $\\ell_1$-regularization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We consider the following regularized empirical risk objective, for $w\\in \\mathbb{R}^d$,\n",
"$$\n",
"F(w) = \\frac1n \\sum_{i=1}^n \\log(1+\\exp(-y_i z_i^\\top w)) + \\lambda \\sum_{j=1}^d |w_j|.\n",
"$$\n",
"This is a non-smooth problem."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"5. Implement gradient descent with a decreasing step-size in $O(1/\\sqrt{t+1})$ to solve this problem, initializing with $w_0=0$. Make plots showing the evolution of the training loss and test the error (0-1 loss). Make these plots for $\\lambda=0.1$, $\\lambda=0.01$ and $\\lambda=0.001$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. SGD for logistic regression with $\\ell_1$-regularization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"7. For the same objective function, implement stochastic gradient descent (picking uniformly at random a single example $(x_i,y_i)$ from the training set at each iteration). Looking at the averaged iterates $\\bar w_t = (1/t)\\sum_{s=0}^t w_s$, compare the convergence speed of SGD vs GD by ploting the evolution of the training loss vs $t$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"8. Comment. Which one converges faster? What is the advantage of SGD?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}