Output units in neural networks¶
The last layer unit of a neural network. And they determine the form of the cross entropy (loss) function.
Linear units for Gaussian distribution¶
Affine transformation with no nonlinearity
h is the output of an previous hidden unit
Can be used to produce the mean of a conditional Gaussian
Sigmoid Unit for Bernoulli output¶
Used for binary classification
σ is the sigmoid function
Softmax unit for multinomial¶
If the output distribution consists of n possible values we can use the softmax function
Thus we pass a linear function z=WTh+b trough a softmax function. The softmax function directly penalizes the most active incorrect prediction. If the answer at the largest of the softmax is correct than −zi and log∑jexp(zj) roughly cancel. Tus it will contribute only little to the overall training loss, the cost is dominated by examples are not correctly classified.