Variational auto-encoder¶

It is an popular instantiation of auto encoding variational bayes. Where we combine:

This algorithm is applicable to any deep generative model $p_{\theta}$ with latent variables that is differentiable in $\theta$. The model $p$ is parametrized as:

\[\begin{split} p(x|z) = N(x| \vec{\mu}(z), \text{diag}(\vec{\sigma}(z))^2) \\ p(z) = N(z| 0, I) \end{split}\]

$\vec{\mu}(z), \vec{\sigma}(z))$ are parametrized neural networks

The model for $q$ is:

\[ q(z|x) = N(z| \vec{\mu}(z), \text{diag}(\vec{\sigma}(z))^2)) \]

Similarly we have neural networks for $\mu, \sigma$.

These choices for $p$ and $q$ alow simplify the auto-encoding ELBO. We can now use a closed form expression to compute the regularization term and use Monte-Carlo estimates for the reconstruction term.

We may interpret the variational autoencoder as a directed latent-variable probabilistic graphical model. We may also view it as a particular objective for training an auto-encoder neural network; unlike previous approaches, this objective derives reconstruction and regularization terms from a more principled, Bayesian perspective.

Reparametrization based low variance gradient estimator¶

Under certain conditions we may express the distribution $q_{\phi}(z|x)$ in a two step generative process:

Sample a noise variable $\epsilon$ like a standard normal N(0,1) $\epsilon \sim p(\epsilon)$
Apply deterministic transformation $g_{\phi}(\epsilon, x)$ that maps the random noise into a more complex distribution. $$z = g_{\phi}(\epsilon, x)$$

For many interesting classes of $q_{\phi}$, it is possible to choose a $g_{\phi}(\epsilon,x)$ such that $z = g_{\phi}(\epsilon,x)$ will be distributed acording to $q_{\phi}(z|x)$.

Gaussian random variables provide the simplest example of the reparametrization trick.

\[ z = g_{\mu, \sigma}(\epsilon) = \mu + \epsilon \cdot \sigma \]

$\epsilon \sim N(0,1)$
z is also gaussian

We can express the gradient of an expectation with respect to $q(z)$ (for any f) as:

\[ \nabla_{\phi}E_{z \sim q_{\phi}}[f(x,z)] = \nabla_{\phi}E_{\epsilon \sim p(\epsilon}[f(x, g_{\phi}(\epsilon, x))] = E_{\epsilon \sim p(e)}[\nabla_{\phi}f(x, g_{\phi}(\epsilon, x))] \]

The gradient is now inside the expectation, so we may take Monte Carlo samples to estimate the right-hand term. This approach has much lower variance than score function estimator and helps us to learn models that we otherwise could not.

study-notes

Variational auto-encoder¶

Reparametrization based low variance gradient estimator¶