Linear discriminant analysis

It is Gaussian model for classification that is similar to Quadratic discriminant analysis where we assume Gaussian distribution for each class, but instead of having specific covariacne matrix for each class Σc we defined an shared covariance matrix Σ.

p(y=c|x,θ)πcexp[μTcΣ1x12xTΣ1x12μTcΣ1μc]
p(y=c|x,θ)exp[μTcΣ1x12μTcΣ1μc+logπc]exp[12xTΣ1x]

The quadratic form:

xTΣ1x

is class independent, and it will cancel out the denominator if we define:

γc=12μTcΣ1μc+logπc
βc=Σ1μc

Hence we get:

p(y=c|x,Σ)=eβTcx+γcceβTc+γc=S(η)c

Where:

  • η=[BT1x+γ1,,BTCx+γc]

  • S is the softmax function

For detailed derivation from QDA to LDA

In the context of LDA, if we would take the log of the softmax function we would get a linear function, thus this makes the decision boundary linear.

MLE

If we maximize the log likelihood we get:

ˆμc=1Nci:yi=cxi
ˆΣc=1Nci:yi=c(xi^μc)(xi^μc)T
^πc=NcN

Regularization

Mle tends to be ill conditioned (covariance matrix is singular) in high dimensional settings, thus we have to introduce some mechanism to avoid overfitting.

Regularization

We perform MAP estimation on Σ using a inverse Wishart Prior of the from IW(diag(ˆΣMLE),v0) hence we have:

ˆΣ=λdiag(ˆΣMLE)+(1λ)ˆΣMLE

Where:

  • λ control the amounth of regularizaton.

ˆΣMLE=V(1NZTZμμT)VTX=UDVTZ=UD
  • X is the design matrix

It is impossible to compute ˆΣMLE if D>N, (Wide short matrix). But we can use SVD of X.

X=UDVT

Diagonal LDA

If we use regularized LDA but we set λ=1 then we get a special variant called Diagonal LDA