Student T¶
Similar to Gaussian distribution with with heavier tails.
PDF:¶
Here
\(\mu\) is the mean of the distribution
\(\sigma^2 > 0\) is the scale parameter
\(v > 0\) is called the degrees of freedom
If \(\mathcal{v} = 1, \mu=0, \sigma = 1\) than 95% HDI is between \([-12.7, 12.7]\)
Moments¶
Mean¶
Is defined only if \(v > 1\), this makes the tails so heavy that the integral for the mean does not converges.
Connections to Cauchy/Lorentz¶
If \(v =1\) this distribution is also known as Cauchy or Lorentz distribution.
Distribution¶
Let \(Z \sim N(0,1), V \sim X^2_n\) where \(Z\perp V\). Than we can express a Student t distribution with n degrees of freedom:
Symetry arround arguments¶
Student T distribution is symmetric in its first two arguments
t-distribution in place of the Normal for bayesian modelling¶
t-distribution has heavy tails and can be used to accommodate
occasional unusual observations in the data distribution
occasional extreme parameters in a prior distribution of hierarchical model
t-distribution has 3 parameters \(t_v(\mu, \sigma^2)\)
\(\mu\) is the center
\(\sigma\) is the scale
\(v\) are the degrees of freedom \(v \in (0, \infty)\)
If we fit an t-distribution using a large amount of data, we can estimate the degrees of freedom from the data.
t -distribution as a mixture of Normals¶
We can draw sample \(y_i \sim t_v(\mu, \sigma)\) using a mixture of Normals as:
\(V_i\) are the auxiliary variables
\(v\) is the degrees of freedom of the t-distribution
Now we assume the following distribution \(p(\mu, \sigma^2, v| y)\)
Unfortunately this is not very efficient because \(\sigma\) and \(V_i\) are dependent, thus if \(\sigma\) is close to zero, than we will saple \(V_i\) close to zero and vice versa.
We can solve this by adding an extra parameters. This random parameter will result in an random walk on a larger space, which unexpectedly improves convergence. Thus our new model is:
\(\alpha\) is the additional scale parameter, it has no meaning hence we can use an uniform prior on a log scale
\(\alpha^2U_i\) are the auxiliary variables
\(\alpha \tau\) plays the role of \(\sigma\) in the original model
We only need \(\mu, \sigma\) the rest we can drop