Comparing OLS with Ridge and Lasso¶
If we assume that \(X\) is orthonormal \(X^TX = I\) we can express RSS as:
\[\begin{split}
RSS(w) = ||y - Xw||^2 = y^Ty + w^TX^TXw - 2w^TXy \\
= const + \sum_k w_k^2 - 2\sum_k \sum_i w_kx_{ik}y_i
\end{split}\]
It factorizes into a sum of terms, one per dimension.
OLS solution is given:¶
\[
\hat{w}_k^{OLS} = x_{:k}^Ty
\]
This is just an orthogonal projection of feature k onto the response vector.
Ridge¶
\[
\hat{w}_k^{RIDGE} = \frac{\hat{w}_k^{OLS}}{ 1 + \lambda}
\]
LASSO¶
\[
\hat{w}_k^{LASSO} = sing(\hat{w}_k^{OLS}) (|\hat{w}_k^{OLS}| - \frac{\lambda}{2})
\]