{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sampling with Stochastic Gradient Langevin Dynamics \n",
"\n",
"Ulysse Marteau-Ferey, Umut Simsekli"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this practical session is to implement the Stochastic Gradient Langevin Dynamics (SGLD -- Welling & Teh, ICML 2011) algorithm in order to draw samples from the posterior distribution of a Gaussian probabilistic model.\n",
"\n",
"While this task might seem \"synthetic\" at a first glance, it has an important role in scientific studies: We will design the probabilistic model in a way that the posterior distribution can be computed analytically. This approach will enable us to measure the performance of our algorithm, since the answer that we would like to find in the beginning is already known; hence, we can compare our results to the \"ground truth\". This is very standard practice in academic studies.\n",
"\n",
"The roadmap for the TD is as follows:\n",
"\n",
"- After defining the probabilistic model, we will first generate a dataset from it.\n",
"- We will compute the posterior distribution / posterior expectations in closed form (which will be used for measuring the performance of SGLD later on)\n",
"- We will implement the SGLD algorithm measure its performance with respect changing hyperparameters (step-size, batch-size)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The proabilistic model\n",
"\n",
"Let $\\theta$ be a random variable in $\\mathbb{R}$, $n \\in \\mathbb{N}_+$ be fixed and $\\{a_i \\in \\mathbb{R}\\}_{i=1}^n$ be a **given** sequence of real numbers. We will consider the following probabilistic model. \n",
"\n",
"\\begin{align}\n",
"\\theta \\sim& \\mathcal{N}(0,\\sigma_\\theta^2) \\\\\n",
"y_i | \\theta \\sim& \\mathcal{N}(a_i \\theta,\\sigma_Y^2), \\qquad \\text{for } i=1,\\dots,n ,\n",
"\\end{align}\n",
"where $\\mathcal{N}$ denotes the Gaussian distribution. To make the notation clear, if a random variable $X\\sim \\mathcal{N}(\\mu, \\sigma^2)$, then it admits the following probabilit density function:\n",
"\\begin{align}\n",
"p_X(x) = \\frac1{\\sqrt{2\\pi \\sigma^2}} \\exp(- \\frac1{2\\sigma^2}(x-\\mu)^2)\n",
"\\end{align}"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"# Required imports\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import scipy.stats as stats\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q1 - Data Generation\n",
"\n",
"- Set $n = 1000$, $\\sigma_\\theta^2 = 10$, $\\sigma_Y^2 = 2$.\n",
"- Generate random $\\{a_i\\}_{i=1}^n$ such that $a_i \\sim \\mathcal{N}(0, 1)$. \n",
"- By fixing the values for $a_i$s, simulate the probabilistic model\n",
" - First generate a $\\theta_{\\text{true}}$ from the prior\n",
" - Then fix $\\theta_{\\text{true}}$ and generate $\\{y_i\\}_{i=1}^n$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q2 - Computing the Posterior in Closed-Form\n",
"\n",
"By denoting the prior distribution by $p(\\theta)$ and the likelihood by $p(D_n|\\theta)$, such that $D_n = \\{y_1,\\dots, y_n\\}$, compute the posterior distribution $p(\\theta|D_n)$ in analytical form.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q3 - Running SGLD\n",
"\n",
"At this step, we will implement the SGLD algorithm to draw approximate samples from $p(\\theta|D_n)$ by only using the prior density and the likelihood (i.e, we will not use the analytical form that we computed in the previous step). \n",
"\n",
"The SGLD algorithm is based on the following simple recursion:\n",
"\n",
"\\begin{align}\n",
"\\theta_{k+1} = \\theta_{k} - \\eta \\nabla \\hat{U}_{k+1} (\\theta_k) + \\sqrt{2\\eta} Z_{k+1},\n",
"\\end{align}\n",
"\n",
"where, $\\eta>0$ is a small step-size, $Z_{k} \\sim \\mathcal{N}(0,1)$ is a standard Gaussian random variable (for all $k$), and \n",
"\n",
"$$\\hat{U}_k(\\theta) := \\hat{U}_{\\Omega_k}(\\theta) := -\\frac{n}{b} \\sum_{i \\in \\Omega_k} \\log p(y_i|\\theta) - \\log p(\\theta). $$\n",
"\n",
"Here, $\\Omega_k \\subset \\{1,\\dots,n\\}$ is a random data subsample that is drawn with or without replacement at each iteration, and $b = |\\Omega_k|$ denotes the number of elements in $\\Omega_k$, also known as batch-size.\n",
"\n",
"\n",
"- Implement the SGLD algorithm for the given model\n",
"- Set $\\eta = 0.001$, $b = n$ (use all the data points at every iteration) and run the algorithm for $K=10000$ iterations. Discard the first $K_{\\text{burn in}} = 5000$ (this is called the \"burn-in period\" -- also think about why this is necessary) iterations and plot the histogram of the last $5000$ iterates. Also plot the true density of the posterior distribution. Do they look similar? \n",
"\n",
"Assume that we want to estimate the second-moment of the posterior distribution, i.e.\n",
"\n",
"$$ \\mathbb{E}_{\\theta \\sim p(\\theta|D_n)} [\\theta^2] = \\int_{\\mathbb{R}} \\theta^2 p(\\theta|D_n) d\\theta. $$\n",
"\n",
"Now consider the corresponding SGLD estimator: (why does this make sense?)\n",
"\n",
"$$ \\hat{\\theta^2} := \\frac1{K - K_{\\text{burn in}}} \\sum_{k = K_{\\text{burn in}} +1}^K \\theta_k^2. $$\n",
"\n",
"Does the SGLD estimator provide reasonable estimates?\n",
"\n",
"Play with the parameters $\\eta$ and $b$. What do you observe?\n"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}