SVM classification¶
We use hinge loss as our loss function, and our objective is:
This is again non differentiable, so we introduce slack variables
This is an quadratic program. And standard solvers solve it in \(O(N^3)\), however there are specialized algorithms like sequential minimal optimization (SMO) than in practice take \(O(N^2)\) for large problems it is best to use a linear SVM which can be trained in \(O(N)\)
The solution has the from of
\(\alpha_i = \lambda_i y_i\)
\(\alpha\) is sparse because of the hinge loss. The \(x_i\) for which \(\alpha_i \ge 0\) are called support vectors; these are the points which are either incorrectly classified, or are classified correctly but are on or inside the margin.
The prediction is done using:
Alternatively we can view this as large margine principle