Sparse coding¶
linear factor model that has been studied as unsupervised feature learning and extraction
we assume that linear factors have Gaussian noise with isotropic precision \(\beta\)
\[
p(x|h) = N(x; Wh + b , \frac{1}{\beta}I)
\]
\(p(h)\) is strongly peak at 0 like a Laplace, Student or Cauchy.
The encoder is an optimization problem defined as:
\[\begin{split}
h^* = f(x) = \argmax_h p(h|x) \\
\argmax_h \log p(h|x) \\
= \argmin_h \lambda ||h||_1 + \beta ||x - Wh||^2_2
\end{split}\]
In training we alternate between minimizing \(h\) and \(W\), which booth are convex.