What would happen if we used the semantic hashing network (the one that encodes a document/image with a 30-bit binary code), and instead of 30 logistic units with noise, we'd use a code layer consisting of a 30-way softmax with noise? What would happen if we trained an autoencoder, with one hidden layer of 500 units, to reconstruct 16-by-16 pixel images? In this case, we have more hidden units than input/output units. How would a denoising autoencoder, or dropout, change what would happen? Why is it called semantic hashing? Video 15f, 1:11-1:32 (the random bit). Elaborate.