Gaussian Process Regression¶

It is a prior on the regression function defined as:

\[ f(x) \sim GP(m(x), \mathcal{k}(x, x')) \]

\(m(x)\) is the mean function
\(\mathcal{k}(x,x')\) is the kernel or covariance function, it requires that
- \(m(x) = E[f(x)]\)
- \(\mathcal{k}(x, x') = E[(f(x) - m(x))(f(x') - m(k'))^T]\)
- \(\mathcal{k}\) is positive definite kernel.

For any finite set of points, this process defines an joint Gaussian:

\[ p(f|X)= \mathcal{N}(f|\mu, K) \]

It is common to use a mean function of \(m(x)= 0\), since the GP is flexible enough to model the mean arbitrarly well.

Noise free observations ¶

We assume that the obserations are noise free. In this case the Gaussian process has to interpolate the observations.

Here wo do not require that the Gaussian process interpolates the observations (but it has to be close).

The predictive performance of GPs depends exclusively on the suitability of the chosen kernel.

We assume that the mean of the process is a linear model:

\[ f(x) = \beta^T \phi(x) + r(x) \]

Here we combine parametric and nonparametric models (semi-parametric model)