The cross-entropy or “softmax” error function
for multi-class classification
The output units use a non-
local non-linearity:
output
units
1
2
3
1
2
3
target value
The natural cost function is the
negative log prob of the right
answer
The steepness of E exactly
balances the flatness of the
softmax.