Comparing gaussian processes to other models¶

Lienar regression Models¶

If we have a bayesian linear regression for D-dimensional features, with prior weights $p(w) = \mathcal{N}(0, \Sigma)$ the posterior can be expressed as:

\[\begin{split} p(f_*| x_*, X, y) = \mathcal{N}(\mu, \sigma^2) \\ \mu = \frac{1}{\sigma^2_y}x_*^TA^{-1}X^Ty \\ \sigma^2 = x_*^TA^{-1}x_* \end{split}\]

$A = \sigma_y^{}-2X^TX+\Sigma^{-1}$

It can be shown that this equals to an GP with covariacne function $\mathcal{k}(x,x')= x^T\Sigma x'$. Howerever this is an degenerate covariacne fuction since it has at most D non-zero eigen values. Thus this model can represent only an limited number of functions.

Neural networks¶

Given a neural network:

\[\begin{split} p(y|x,\theta) = \mathcal{N}(y| f(x;\theta), \sigma^2) \\ f(x) = b + \sum_{j=1}^H v_j g(x; \mu_j) \end{split}\]

$b$ bias
$v_j$ is the output weight from hidden unit j to response $y$
$u_j$ are the inputs weights to unit $j$ from input $x$
$g$ is the hidden unit activation function (any smooth function)

If we assume the following priors:

\[\begin{split} b \sim \mathcal{N}(0, \sigma^2_b) \\ v \sim \prod_j \mathcal{N}(v_j|0, \sigma^2_w) \\ u \sim \prod_j p(u_j) \end{split}\]

We denote all the weigths by $\theta$ we have: $$ E_{\theta}[f(x)] = 0 \\ E_{\theta}[f(x)f(x')] = \sigma^2_b + \sum_j \sigma^2_v E_v[g(x; u_j)g(x'; u_j)] \\ = \sigma^2_b + H \sigma^2_v E_v[g(x;u)g(x';u)] $$

In the limit as the number of hidden units $H \rightarrow \infty$ we get a Gaussian process.

study-notes

Comparing gaussian processes to other models¶

Lienar regression Models¶

Neural networks¶