Clustering and backpropagation
• We need to tie the input->hidden weights to be the same as
the hidden->output weights.
– Usually, we cannot backpropagate through binary hidden
units, but in this case the derivatives for the input-
>hidden weights all become zero!
• If the winner doesn’t change – no derivative
• The winner changes when two hidden units give exactly the
same error – no derivative
• So the only error-derivative is for the output weights. This
derivative pulls the weight vector of the winning cluster
towards the data point. When the weight vector is at the
center of gravity of a cluster, the derivatives all balance out
because the c. of g. minimizes squared error.