Output units in neural networks¶
The last layer unit of a neural network. And they determine the form of the cross entropy (loss) function.
Linear units for Gaussian distribution¶
Affine transformation with no nonlinearity
\(h\) is the output of an previous hidden unit
Can be used to produce the mean of a conditional Gaussian
Sigmoid Unit for Bernoulli output¶
Used for binary classification
\(\sigma\) is the sigmoid function
Softmax unit for multinomial¶
If the output distribution consists of n possible values we can use the softmax function
Thus we pass a linear function \(z = W^Th + b\) trough a softmax function. The softmax function directly penalizes the most active incorrect prediction. If the answer at the largest of the softmax is correct than \(-z_i\) and \(\log \sum_j \exp (z_j)\) roughly cancel. Tus it will contribute only little to the overall training loss, the cost is dominated by examples are not correctly classified.
Mixture density networks¶
The output of a NN is a mixture of Gaussians.