Recurrent neural networks conditioned on context¶

In most cases recurrent neural network are not just sequences of random variables \(y^{(t)}\) but also include inputs \(x^{(1)}, \cdots, x^{(t)}\). Because of this full conditional distribution is of the form:

\[ p(y|\theta(x)) \]

The common way to provide extra input are:

  • extra input at each time step

  • as the initial state \(h^{(0)}\)

  • boot above

Extra input at timestep¶

The whole vector from the beginning

There is a single vector \(x=(x^{(1)}, \cdots, x^{(t)})\). We map the input using a weight matrix R and it is available at every step.

Only x^i at step i

This assumes

\[ \prod_t p(y^{(t)}| x^{(1)}, \cdots, x^{(t)}) \]

We add a connection from t to the output at time t to the hidden units at \(t+1\).