Probit regression¶
Is an generalized linear model where we assume that our inverse link function is the CDF of a Normal distribution.
\(\Phi\) is the standard normal CDF
MLE¶
We can use gradient descent to find MLE. Let \(\mu_i = w^Tx_i\) and \(\tilde{y}_i \in \{ -1, +1 \}\) the gradient of the log likelihood is:
\(\phi\) is the standard normal PDF
\(\Phi\) is the standard normal CDF
MAP¶
We put a Gaussian prior \(p(w) = N(0, V_0)\) the penalized gradient becomes:
Latent variable interpretation¶
Let us associate each item \(x_i\) with two latent utilities \(u_{0i}, u_{1i}\) corresponding to possible choises of \(y_i = 0\) and \(y_i = 1\). We assume that the observed choise is whichever action has larger utility.
\(\delta\)’s are error terms, representing all the other factors that might be relevant in decision making that we have choosen not to model. And they have Gaussian distribution
This representation is called a random utility model(RUM)
Since only the difference in utilities matter we define
\(z_i \triangleq w^Tx_i + \epsilon_i\)
\(\epsilon_i = \delta_{1i} - \delta_{0i},\epsilon_i \sim \mathcal{N}(0,1)\)
This makes
We call this the difference RUM or dRUM model.
Now if we marginalize \(z_i\) we recover the probit model: