Variational inference¶

We try to approximate the posterior using a distribution from tractable family.

\[ p(\theta|D) \approx q(\theta) \]

We assume that \(q\) has some free parameters that we can optimize, make the approximation as tight as possible.

To do this we usually minimize the KL divergence:

\[ KL(p^*||q) = \sum_x p^*(x) \log \frac{p^*(x)}{q(x)} \]

\(p^*(x)\) is our posterior distribution.

This is hard to compute since the expectation \(\sum_x p^*(x)\) is not tractable. Instead we work with the reverse KL divergence

\[ KL(q||p^*) = \sum_x q(x) \log \frac{q(x)}{p^*(x)} \]

If we choose \(q\) right this is tractable. But evaluating \(p^*(x) = p(x|D)\) point-wise is hard, since it requires to evaluate the normalization constant \(Z = p(D)\). We have to use the un-normalized distribution:

\[ \tilde{p}(x) = p^*(x)Z \]

The objective function becomes:

\[\begin{split} J(q) = KL(q||\tilde{p}) \\ = \sum_x q(x) \log \frac{q(x)}{Zp^*(x)} \\ = \sum_x q(x) \log \frac{q(x)}{p^*(x)} - log Z \\ = KL(q||p^*) - \log Z \end{split}\]

Here Z is an constant, thus by minimizing \(J(q)\) we will force q to become close to \(p^*\).

KL divergence is nonnegative, thus \(J(q)\) can be viewed as an upper bound on the NLL(negative log likelihood).

\[\begin{split} L(q) = -J(q) = -KL(q||p^*) + \log Z \le \log Z = \log p(D) \\ \log p(D) = KL(q||p) - J(q) \\ \log p(D) \ge -J(q) \\ \end{split}\]

By minimizing \(J(q)\) we essentially maximize the lower-bound of the log likelihood \(\log p(D)\). Because of this \(-J(q)\) is called the variational lower bound or evidence lower bound (ELBO).

Mean field method ¶

One of the most popular variational inference methods. It assumes that the posterior is fully factorized approximation of the form:

\[q(x) = \prod_{i=1} ^D q_i(x)\]

Our goal is to solve this optimization problem:

\[min_{q_1, \cdots, q_D } KL(q||p) \]

study-notes

Variational inference¶

Mean field method¶

Mean field method ¶