Variational auto-encoder¶
It is an popular instantiation of auto encoding variational bayes. Where we combine:
This algorithm is applicable to any deep generative model \(p_{\theta}\) with latent variables that is differentiable in \(\theta\). The model \(p\) is parametrized as:
\(\vec{\mu}(z), \vec{\sigma}(z))\) are parametrized neural networks
The model for \(q\) is:
Similarly we have neural networks for \(\mu, \sigma\).
These choices for \(p\) and \(q\) alow simplify the auto-encoding ELBO. We can now use a closed form expression to compute the regularization term and use Monte-Carlo estimates for the reconstruction term.
We may interpret the variational autoencoder as a directed latent-variable probabilistic graphical model. We may also view it as a particular objective for training an auto-encoder neural network; unlike previous approaches, this objective derives reconstruction and regularization terms from a more principled, Bayesian perspective.
Reparametrization based low variance gradient estimator¶
Under certain conditions we may express the distribution \(q_{\phi}(z|x)\) in a two step generative process:
Sample a noise variable \(\epsilon\) like a standard normal N(0,1) \(\epsilon \sim p(\epsilon)\)
Apply deterministic transformation \(g_{\phi}(\epsilon, x)\) that maps the random noise into a more complex distribution. $\(z = g_{\phi}(\epsilon, x)\)$
For many interesting classes of \(q_{\phi}\), it is possible to choose a \(g_{\phi}(\epsilon,x)\) such that \(z = g_{\phi}(\epsilon,x)\) will be distributed acording to \(q_{\phi}(z|x)\).
Gaussian random variables provide the simplest example of the reparametrization trick.
\(\epsilon \sim N(0,1)\)
z is also gaussian
We can express the gradient of an expectation with respect to \(q(z)\) (for any f) as:
The gradient is now inside the expectation, so we may take Monte Carlo samples to estimate the right-hand term. This approach has much lower variance than score function estimator and helps us to learn models that we otherwise could not.