Here is an old email exchange from 2008. It shows that if you have a good idea you should publish it! ----------------------------------------- ec@cs.brown.edu via cs.toronto.edu 7/9/08 to hinton Geoff, In your lecture at Brown you showed some results at the end where you represented individual words as a set of independent variables. I think the idea is that "reading" and "read" might be close according to one variable, while "reading" and "writing" might be close according to another. Would it be possible to see this data? Eugene Geoffrey Hinton Attachments7/10/08 to Andriy, ec Each word is converted in to a vector of 100 real values in such a way that the vectors for the previous n words are good at predicting the vector for the next word. We spent a while looking at the vectors. If you display them in 2-D using one of our recent dimensionality reduction methods that keeps very similar vectors very close, you get cute pictures. Here is a paper describing how the features are obtained and a paper showing the vectors in 2-D. We now have even nicer pictures but I cant find them! We tried looking at the vectors but couldnt understand the individual features. I could ask the grad student to send you the 17000 100-D vectors if you want. Let me know. However, it can do analogies in a very dumb way. To answer A:B = C:? it takes the 100-D vector difference B-A and adds it to C. Then it finds the closest vector that isnt A or B or C. So it can do is:was = are:? correctly (i.e it says "were") Geoff