Gaussian process latent variable model¶
This can be viewed as an combination of probabilistic PCA and kernels.
Probabilistic PCA model is defined as:
We can find the MLE of this problem by computing the eigenvectors of \(Y^TY\)
The dual of this problem we use a prior \(P(W) = \prod_j \mathcal{N}(w_j| 0, I)\). We want to maximize \(Z\) and integrate \(W\) out.
\(K_z = ZZ^T + \sigma^2I\)
This method can also be solved by finding the eigne values. If we use an linear kernel we can recover PCA but we can use an more general kernel \(K_z = K + \sigma^2I\) where \(K\) is a Gram matrix for Z.
Unfortunately no longer we can use the eigenvalue method, instead we use a gradient-based optimizer. Our loss function is:
The gradient is given by $\( \frac{\partial l}{\partial Z_{ij}} = \frac{\partial l}{\partial K_z} \frac{\partial K_z}{\partial Z_{ij}} \)$
\(\frac{\partial l}{\partial K_z} = K_z^{-1}YY^T K_z^{-1} - DK_z^{-1}\)
\(\frac{\partial K_z}{\partial Z_{ij}}\) depends on the kernel used
In kernelized PCA, we learn a kernelized mapping from the observed space to the latent space, whereas in GP-LVM, we learn a kernelized mapping from the latent space to the observed space.
GP-LVM inherits the usual advantages of probabilistic generative models, such as the ability to handle missing data and data of different types, the ability to use gradient-based methods (instead of grid search) to tune the kernel parameters, the ability to handle prior information.