Latent Variable Models¶
Here we assume that observed variables are correlated because they arise form a hidden common “cause”. Model with hidden variables are known as latent variable models (LVM). Our goal is to find the following joint distribution:
x is observed
z is unobserved
In general LVM are harder to fit, than models with no latent variables. However they have significant advantages:
They often have fewer parameters than models that directly represent correlation in the visible space
The hidden variables in an LVMM can serve as bottleneck, which computes a compressed representation of the data.
Since latent variables can be used to explain the data generation process in general we are interested to find out something about \(p(z| x)\). To find out something about our hidden variables given the observed ones.
Learning¶
When we train an LVM model our goal si to maximize the marginal-log likelihood:
This optimization is difficult because of the summation inside the log cannot be decomposed into sum of log factors.
If we look closer at a data point x:
It is mixture of distributions \(p(x|z)\) with weights \(p(z)\). If we assume that a single \(p(x|z)\) is from an exponential family it has a concave log-likelihood, but the log of a weighted mixture of such distribution is no longer concave. Thus our goal is non-convex and we have to use approximate learning algorithms.