Noisy Gaussian process Regerssion¶

Here we assume:

\(y = f(x) + \epsilon\)

  • \(\epsilon \sim N(0, \sigma^2_y)\)

Here the model is not required to interpolate the data, but it must come close to the observed data.

The covariance of this model is given by:

\[\begin{split} cov[y_p, y_q] = \mathcal{k}(x_p, x_q) + \sigma_y^2 \delta_{pq} \\ cov[y|X] = K + \sigma_y^2I_N = K_y \end{split}\]
  • \(\delta_{pq} = I(p = q)\)

The second term is diagonal because we assumed that the nise terms are independently added to each observation.

The join density of the observed data and the latent, noise-free function (\(f_*\))on the test points is given:

\[\begin{split} \begin{pmatrix} y \\ f_* \end{pmatrix} \sim \mathcal{N}(0, \begin{pmatrix} K_y & K_* \\ K_*^T & K_{**} \end{pmatrix}) \end{split}\]

Here we assume the mean is zero for notational simplicity. The posterior predictive density is:

\[\begin{split} p(f_*| x_*, X, y) = \mathcal{N}(f_*|\mu_*, \Sigma*) \\ \mu_* = K_*^TK_y^{-1}y \\ \Sigma_* = K_{**} - K^T_*K_y^{-1}K_* \end{split}\]

We can write the posterior mean as:

\[\hat{f}_* = k_*^TK_y^{-1}y = \sum_{i=1}^N \alpha_i \mathcal{k}(x_i, x_*)\]
  • \(\alpha = K_y^{-1}y\)