Neural network universal approximator¶
Given enough hidden units and at least one nonlinear activation a neural network may approximate any function with arbitrary low error. However it still may fail if
- the learning algorithm wont find the correct weights 
- the learning algorithm overfits 
In general it is advised to use deeper neural networks than wider, this can lead to less units needed for better accuracy.
By choosing a deep model, we say that we believe that the learning consists of discovering a set of underlying factors of variations that can be expressed in terms of other simpler underlying factors of variations.
