Bernoulli-Gaussian Model¶
It is the model of the form:
\[\begin{split}
y_i | x_i, w, \gamma, \sigma^2 \sim N(\sum_j \gamma_j w_j x_{ij}, \sigma^2) \\
\gamma_j \sim Ber(\pi_0) \\
w_j \sim N(0, \sigma^2_w)
\end{split}\]
In the signal processing literature this is called the Bernoulli-Gaussian model, although we could also call it the binary mask model, since we can think of the \(\gamma_j\) variables as “masking out” the weights \(w_j\) .
From this model we can derive \(l_0\) regularization:
\[f(w) = ||y - Xw ||_2^2 + \lambda ||w||_0\]
This is called \(l_0\) regularization, which converts the discrete optimization into a continuous one. However the \(l_0\) pseudo-norm makes the objective very non smooth so it is still hard to optimize.