Home
Publications
Download
Personal
Pictures
Letters
Contact

June-July 2009

Hello friends,

On June I visited a conference on vision in 3D environments at York University here in Toronto. This was human vision and psychology which is complementary to computer vision. The format was invited talks, where people present their research over the last 10 years, which sometimes can be more interesting than conferences where people compete for the reinvention of the year. In the past I felt psychologists were asking very deep questions, but lacked the computational and engineering background to be able to address them. However, it seems they exhausted the paper questioners and nowadays cutting edge psychology became computer-intensive. For example, Bill Warren presented research on human navigation in virtual reality mazes. The maze had worm-tunnels between locations, which participants didn’t noticed, leading to the conclusion that humans create only local maps of their environment in their heads.

That brings me to the Netflix competition, which just closed although the winners were not announced officially. You would imagine that matching movies to people would involve understanding human psychology, but it seems the winners used heavy number crunching and statistics. It is too early to say something, since the participants didn’t reveal their strategies. Maybe there are some original ideas. But from what have been published, it seems thousands of contestants tried every formula and tweak on the planet for two years to shave 5% from the score they got after several months.

Einstein said that god does not play dice, and I agree. There are definitely places for statistics, but professional statisticians validate their statistical models before jumping to conclusions. They had enough experience with false predictions. So what is wrong with statistics and data fitting? First, it treats errors as part of the model. If something goes wrong, they don’t feel bad about it. Secondly, there is the publication bias: suppose there is a 5% probability that an experiment will succeed by chance. If it is carried at 20 places over the world, it is likely that one will succeed only by chance and will publish the result. The other 19, not knowing about each other, will think they had done something wrong. Thirdly, statistics might be an over-simplification at the wrong level. A famous example is the Ptolemaic system of astronomy which had the stars surrounding the earth in circular orbits. When they found the orbits are not accurate, they added epicycles (circles around circles). Their model was accurate enough, lasting over 1000 years, until Kepler suggested elliptical orbits, which lead to the theory of gravity. There are many other examples. One I saw some time ago is an embedding of the US Supreme Court judges on the real line between liberals and conservatives (i.e. they computed a number for each judge from similarity scores to other judges). This has nothing to do with law philosophy (I am sure the authors did it only as an exercise). Another example you might have heard about is an attempt to define visual beauty of faces by distance to an average face. Again, nice exercise, but ignores the experience gained in arts history. In short, there is a difference between accurate prediction and complete explanation. Whenever people get useful predictors they usually stop thinking about the underlying causes.

I think the Netflix was an exciting competition that wasn’t possible at all only 10 years ago, with lots of drama at the end. It is definitely worth a 5 stars movie. But what are the implications? Are we going to walk into a store 10 years from now, a camera will take our picture and a printer will provide us a shopping list? Where is the line between computerized recommendations and computers telling us what to do?

Ady.