Learning in conditional random fields¶
A conditional random field is a probability distribution of the form:
\(Z(x,\varphi)\) is the partition function.
Hear each function depends on \(x\) in addition of \(y\), where \(x\) are fixed and we get a distribution only over \(y\).
Similarly as with learning markov random fields we re-parametrize \(p\)
\(f(y,x)\) is a vector of indicator functions
\(\theta\) is a re-parametrization of the model parameters
The log likelihood for dataset D is:
The gradient of this likelihood is:
And the hessian is just the covariance matrix:
Unfortunately the gradient requires one inference per training data point x,y in order to compute:
This makes CRF more expensive in learning than MRF.
In practice there are more popular objective functions for training CRF called “max-margin” loss which generalizes the objective for training SVM. Models using this loss are called structured support vector machines.