Leaky Unis

We with linear self-connection and weight near 1 on these connections. The idea is that if we calculate the running average μt for some value v(t)

μ(t)=αμ(t1)+(1α)v(t)
  • α is a linear self connection from the past value μ(t1) to μ(t). If it is near 1 than the running average remembers information about the past for a long time. If it is close to 0 the information about the past is rapidly discarted.

This idea of discarding an keeping past is the idea behind leaky units. Where the constant α is either fixed or learned as a free parameter.