Recurrent neural networks conditioned on context¶
In most cases recurrent neural network are not just sequences of random variables y(t) but also include inputs x(1),⋯,x(t). Because of this full conditional distribution is of the form:
p(y|θ(x))
The common way to provide extra input are:
extra input at each time step
as the initial state h(0)
boot above
Extra input at timestep¶
The whole vector from the beginning
There is a single vector x=(x(1),⋯,x(t)). We map the input using a weight matrix R and it is available at every step.
Only x^i at step i
This assumes
∏tp(y(t)|x(1),⋯,x(t))
We add a connection from t to the output at time t to the hidden units at t+1.