Bayesian Statistics¶

The main goal of Bayesian statistics is finding the posterior distribution. That is the distribution of parameteres of interest given the data.

\[ p(\theta|D) \]

To find this distribution we must make an couple of assumptions.

We describe the data given the parameters \(p(D|\theta)\) this is commonly known as likelihood
Express our belief about the paremters \(p(\theta)\) his is known as prior

Together this forms the following equation:

\[ \begin{align}\begin{aligned}\begin{split} p(\theta|D) \propto p(D|\theta) p(\theta) \\\end{split}\\p(\theta|D) = \frac{p(D|\theta)p(\theta)}{p(D)} \end{aligned}\end{align} \]

The later equation contains an aditional term:

\[p(D) = \int_{\theta} p(D|\theta)p(\theta)\]

This is aloso called the evidence and it is what makes Bayesian statistics hard! Since calculating the integral is notrivial.

Finding the posterior distribution (Inference)¶

There are multiple approaches how to find \(p(\theta|D)\) depending if we want an exact or approximate solution:

Analytical solution using conjugate priors¶

Computing \(p(\theta|D)\) can be nontrivial, but for some cases it can be achieved analyticaly. We require that we choose an conjugate priors to an likelihood.

Markov chain monte carlo ¶

We build an markov chain whose stationary distribution is the posterior distribution.

Variational approximation (Inference)¶

We try to approximate the posterior using a distribution from tractable family.

\[ p(\theta|D) \approx q(\theta) \]

We want this approximation to be tight ass possible.

Model averaging ¶

We can use the probability of the model \(p(M|D)\) to average over.

Model selection¶

Sometimes we need to choose the best model

Posterior predictive distribution¶

The posterior is our belief about the world. We can test if our belief is justified if it manages to predict observed quantities. By drawing from the posterior we get the posterior predicitve distribution

Bayesian Learning vs Maximum Likelihood¶

Here we draw the basic difference between maximum likelihood learning and Bayesian learning

study-notes