Home
Publications
Download
Personal
Pictures
Letters
Contact

March 2007

Hello friends,

Before getting into what computer vision research is like, some picture of computer vision is necessary. There are tons of vision books (e.g. Hubel’s is a classic), which make it easy to get lost. Vision is one of these areas that if you don’t feel lost you are already lost.

The starting point to vision is the eye and the brain. Some engineering-oriented courses never mention human perception, and that is criminal. Let’s begin with some basic questions. How many eyes do you have? How many images do you see? Is the image coming from your left or right eye? Have you ever noticed you see the world from a cyclopean view?
Cover one eye with your hand, while keeping both eyes open. Without noticing, your brain performed several interesting operations. First, although the amount of light getting to your eyes is half, nothing looks darker (this is called gain control). Secondly, notice that you still see objects in 3D. Many people don’t believe 3D perception is possible without stereo vision. But given the short baseline between the eyes, stereo is effective only for few meters. Thirdly, the hand covering your eye almost disappeared. There is a large number of known visual illusions. Things get even weirder for people with abnormal vision, such as stereo-blind, motion-blind, neglects (blind to part of the visual field), or faces-blind (prosopagnosia). There might even be tetrachromats. The bottom line is that the brain is performing a great deal of processing unconsciously, and perception is not always veridical.

Historically, scientists studied visual perception long before they had computers. Some psychology-oriented courses never mention computer vision, and that is criminal. When people tried to program computers to see via cameras, they realized the problem is much more complicated. The issue of perceiving 3D from a 2D image is a huge open question (although there are many theories about all sorts of depth cues). Understanding what is in an image requires some models and assumptions, since images do not contain enough information. Interpretation depends not only on the image, but also on the prior knowledge and memory of the observer (e.g. in images with multiple interpretations). In addition, the real world is pretty complicated. The same object may look very different from different viewpoints, under different illumination, or when partly occluded.

Since then, people applied any known approach to vision problems, borrowing methods from mathematics, statistics, physics, biology, or just implementing reasonable heuristics. All these methods have one thing in common: they didn’t solve the problem. However, there are now a growing number of applications where computers do reasonably well and sometimes even outperform humans. Unfortunately, there is a big difference between something that works to something that understood. To illustrate, imagine we want to create a system that recognizes oranges. We pick a pixel and check whether its color is orange. That may be good enough for an application. But what if the image is in grayscale? What if the orange is opened? What if it is on a tree? What if we observe a clementine? Such a system shouldn’t be called a recognizer, or detector, but color-matcher. Most vision systems are just more complicated versions of this.

Computer vision has a very special place in computer science. First, it is not focused on the inside of the computer. It primarily deals with understanding the world outside the computer. Secondly, it is guaranteed that some solutions exist, since biological vision is living evidence. The situation is very similar to the early days of aviation. For many years people tried to invent flying machines by imitating birds, until the Wright brothers came up with the fixed wing design that actually worked. The engine worked differently from the way in which birds fly, but the idea to use wings was taken from nature. Similarly, biological vision sets a high standard for computer vision, and motivates the vision people to keep looking.

Ady.