What would happen if we used the semantic hashing network (the one that encodes a document/image with a 30-bit binary code), and instead of 30 logistic
units with noise, we'd use a code layer consisting of a 30-way softmax with noise?

What would happen if we trained an autoencoder, with one hidden layer of 500 units, to reconstruct 16-by-16 pixel images? In this case, we have more
hidden units than input/output units. How would a denoising autoencoder, or dropout, change what would happen?

Why is it called semantic hashing?

Video 15f, 1:11-1:32 (the random bit). Elaborate.