Infering the parameters of an NVM¶
We assume that \(D = \{x_1, \cdots, x_N \}\) where \(x_i \sim \mathcal{N}(\mu, \Sigma)\) observations. We may want to find:
\(P(\mu| D, \Sigma)\)
\(P(\Sigma| D ,\mu)\)
\(p(\mu, \Sigma |D)\)
\(P(\mu|D, \Sigma)\).¶
Likelihood¶
Prior¶
Gaussian prior is the Conjugate prior to the Gaussian distribution.
We can derive the Gaussian posterior for \(\mu\):
This becomes a linear gaussian system $\(p(\mu|D, \Sigma) = \mathcal{N}(\mu|m_N, V_N) \)$
Where:
\(V^{-1}_N = V_0^{-1} + N \Sigma^{-1}\)
\(m_N = V_N (\Sigma^{-1} (N \bar{x}) + V_0^{-1}m_0)\)
\(P(\Sigma|D, \mu)\)¶
Likelihood:¶
Prior¶
Conjugate prior is Inverse Wishart distribution.
Where:
\(v_0 > D -1\)
\(S_0\) is symmetric pd matrix.
\(S_0^{-1}\) plays the role of the prior scatter matrix
\(N_0 \triangleq v_0 + D +1\) controls the strength of the prior.
Posterior¶
Where:
\(v_N = v_0 + N\)
\(S_N^{-1} = S_0 + S_{\mu}\)
Hence the posterior strength \(v_N\) is the prior strength \(v_0\) plus the number of observations N, and the posterior scatter matrix \(S_N\) is the prior scatter matrix plus the data scatter matrix.
\(P(\mu, \Sigma| D)\)¶
Likelihood¶
We can re express the term in the exponent using the follwing fact:
Prior¶
We can use the obvious mixture prior: $\( p(\mu, \Sigma) = \mathcal{N}(\mu| m_0, V_0) IW(\Sigma| S_0, v_0)\)$
Unfortunately this is not a conjugate prior, it is a semi-conjugat or conditionally conjugate, since booth \(p(\mu|\Sigma), p(\Sigma|\mu)\) are individually conjugate.
To create a conjugate prior, we need to use a prior where \(\mu\) and \(\Sigma\) are dependent of each other. And we will use a joint distribution of the form:
and it is a Normal-inverse-wishart (NIW) defined as:
Where:
\(m_0\) is our prior mean for \(\mu\)
\(k_0\) is how strongly we believe in the prior of the mean
\(S_0\) is the prior for \(\Sigma\)
\(v_0\) is how strongly we believe in the prior for \(\Sigma\)
Posterior¶
where:
\(m_N = \frac{k_0 m_0 + N \bar{x}}{k_n} = \frac{k_0}{k_0 + N }m_0 + \frac{N}{k+0 + N}\bar{x}\)
\(k_n = k_0 + N\)
\(v_N = v_0 + N\)
\(S_n = S_0 + S\bar{x} + \frac{k_0 N}{k_0 + N}(\bar{x} - m_0)(\bar{x} - m_0)^T = S_0 + S\bar{x} + k_0 m_0m_0^T - k_N m_N m_N^T\)
\(S \triangleq \sum_{i=1}^N x_i x_i^T\) this is the uncentered sum of squares matrix.
Here we can sea that the posterior mean is a convex combination of the prior mean and the MLE, with strength \(k_0 + N\), and the posterior scatter matrix \(S_N\) is the prior scater matrix \(S_0\) plus the empirical scatter matrix \(S_{\bar{x}}\) and an extra term due the uncertainity in the mean.
Posterior marginals¶
The posterior marginal for \(\mu\) has a multivariate Student T distribution: $\(p(\mu|D) = \mathcal{T}(\mu | m_N, \frac{1}{k_N (v_n - D + 2)S_N}, v_N - D + 1) \)$
This follows the fact that the Student distribution can be represented as a scaled mixture of Gusssians.
Posterior predictive¶
Univariate case¶
Here we assume that \(P(\mu, \sigma^2| D)\) follows an normal inverse chi-squared distribution.