Softmax
The output units use a non-
local non-linearity:
output
units
1
2
3
1
2
3
desired value
The cost function is the negative
log prob of the right answer
The steepness of C exactly
balances the flatness of the
output non-linearity