Derivation of BIC¶
If we take the Gaussian approximation to the marginal likelihood and drop the irelevant terms:s
\(E(\theta^*) = \log p(\theta^*, D)\)
\(p(D| \theta^*)\) is also known as Occams factor and it measures the model complexity
If we assume uniform prior \(p(\theta)\propto 1\) we can replace \(\theta^*\) by the MLE \(\hat{\theta}\) and drop \(\log p(\theta^*)\)
Thus we remain with
We need to approximate \(\log |H|\) also alled Occmas factor since it measures the model xomplexity (volue of the osterioir distribution). $\(H = \sum_{i=1}^N H_i\)$
\(H_i = \nabla \nabla \log p(D_i| \theta)\)
Each \(H_i\) can be approximated by a fixed matrix
\(\hat{H}\) is a fixed matrix
Than we have:
\(D = \dim(\theta)\)
\(H\) is full rank
(\(N^d\) is from rules of determinats sice we multiply each row by N).
Now \(\log |\hat{H}|\) is independent of N, it will be overwhelmed by the likelihood hence we can approximate the marignal likelihood as: