Variational auto-encoder

It is an popular instantiation of auto encoding variational bayes. Where we combine:

  1. Autoencoding ELBO reformulation

  2. Black box variational inference approach

  3. Reparametrization based low variance gradient estimator

This algorithm is applicable to any deep generative model pθ with latent variables that is differentiable in θ. The model p is parametrized as:

p(x|z)=N(x|μ(z),diag(σ(z))2)p(z)=N(z|0,I)
  • μ(z),σ(z)) are parametrized neural networks

The model for q is:

q(z|x)=N(z|μ(z),diag(σ(z))2))

Similarly we have neural networks for μ,σ.

These choices for p and q alow simplify the auto-encoding ELBO. We can now use a closed form expression to compute the regularization term and use Monte-Carlo estimates for the reconstruction term.

We may interpret the variational autoencoder as a directed latent-variable probabilistic graphical model. We may also view it as a particular objective for training an auto-encoder neural network; unlike previous approaches, this objective derives reconstruction and regularization terms from a more principled, Bayesian perspective.

Reparametrization based low variance gradient estimator

Under certain conditions we may express the distribution qϕ(z|x) in a two step generative process:

  1. Sample a noise variable ϵ like a standard normal N(0,1) ϵp(ϵ)

  2. Apply deterministic transformation gϕ(ϵ,x) that maps the random noise into a more complex distribution. $z=gϕ(ϵ,x)$

For many interesting classes of qϕ, it is possible to choose a gϕ(ϵ,x) such that z=gϕ(ϵ,x) will be distributed acording to qϕ(z|x).

Gaussian random variables provide the simplest example of the reparametrization trick.

z=gμ,σ(ϵ)=μ+ϵσ
  • ϵN(0,1)

  • z is also gaussian

We can express the gradient of an expectation with respect to q(z) (for any f) as:

ϕEzqϕ[f(x,z)]=ϕEϵp(ϵ[f(x,gϕ(ϵ,x))]=Eϵp(e)[ϕf(x,gϕ(ϵ,x))]

The gradient is now inside the expectation, so we may take Monte Carlo samples to estimate the right-hand term. This approach has much lower variance than score function estimator and helps us to learn models that we otherwise could not.