Leaky Unis¶
We with linear self-connection and weight near 1 on these connections. The idea is that if we calculate the running average μt for some value v(t)
μ(t)=αμ(t−1)+(1−α)v(t)
α is a linear self connection from the past value μ(t−1) to μ(t). If it is near 1 than the running average remembers information about the past for a long time. If it is close to 0 the information about the past is rapidly discarted.
This idea of discarding an keeping past is the idea behind leaky units. Where the constant α is either fixed or learned as a free parameter.