Comparing gaussian processes to other models¶
Lienar regression Models¶
If we have a bayesian linear regression for D-dimensional features, with prior weights \(p(w) = \mathcal{N}(0, \Sigma)\) the posterior can be expressed as:
\(A = \sigma_y^{}-2X^TX+\Sigma^{-1}\)
It can be shown that this equals to an GP with covariacne function \(\mathcal{k}(x,x')= x^T\Sigma x'\). Howerever this is an degenerate covariacne fuction since it has at most D non-zero eigen values. Thus this model can represent only an limited number of functions.
Neural networks¶
Given a neural network:
\(b\) bias
\(v_j\) is the output weight from hidden unit j to response \(y\)
\(u_j\) are the inputs weights to unit \(j\) from input \(x\)
\(g\) is the hidden unit activation function (any smooth function)
If we assume the following priors:
We denote all the weigths by \(\theta\) we have: $\( E_{\theta}[f(x)] = 0 \\ E_{\theta}[f(x)f(x')] = \sigma^2_b + \sum_j \sigma^2_v E_v[g(x; u_j)g(x'; u_j)] \\ = \sigma^2_b + H \sigma^2_v E_v[g(x;u)g(x';u)] \)$
In the limit as the number of hidden units \(H \rightarrow \infty\) we get a Gaussian process.