Polyak Averaging¶
Average serveral points that where visited by an optimization algorithm. If t iterations of gradient descent visit points \(\theta^{(1)}, \cdots, \theta^{(t)}\) than the Polyak average is:
\[
\hat{\theta}^{(t)} = \frac{1}{t} \sum_{i} \theta^{(i)}
\]
For convex problems this has strong convergence guarantees. For noncovex settings we have to agument it a bit and use an exponentially decaying running average:
\[
\hat{\theta}^{(t)} = \alpha\hat\theta^{(t-1)} + (1+\alpha) \theta^{(t)}
\]