John DiMarco on Computing (and occasionally other things)

John DiMarco on Computing (and occasionally other things)
I welcome comments by email to jdd at cs.toronto.edu.

Mon 30 Sep 2019 00:00

What's all the fuss about AI anyway?

Brain-shaped Network — Image by Gordon Johnson from Pixabay

A great deal in the past five years has been written about Artificial Intelligence (AI). But there's a lot of confusion about what AI actually is, and why it is of special interest now. Lets clear up some of that confusion. In ordinary language, what is this fuss about AI all about?

AI, broadly understood, is a term used to describe a set of computing techniques that allow computers to do things that human beings use intelligence to do. This is not to say that the computer is intelligent, but rather that the computer is doing something that, if done by a person, would be considered evidence of that person's intelligence. Contrary to widespread opinion, this is not the same thing as an artificial person. In fact, there have been for a long time many things that humans use intelligence to do, that computers do better, whether it be remembering and recalling items, doing arithmetic, or playing chess. But computers do these things using different techniques than humans do. For example, Deep Blue, a custom chess computer built by IBM, beat Garry Kasparov, the then-reigning world chess champion, in 1997, but Deep Blue played chess in a very different way than Garry. Garry relied on his human intelligence, while Deep Blue used programming and data.

However, some computer scientists, noting that people can do things that computers can't, thought long and hard about ways that people do it, and how computers might be progammed to do the same. One such technique, deep learning, a neural network technique modelled after the human brain, has been worked on since the 1980s, with slow but steady improvement, but computer power was limited and error rates were often high, and for many years, most computer scientists seemed to feel that other techniques would yield better results. But a few kept at it, knowing that the computers of the day were inadequate, but advances in computing would make things possible that weren't possible before.

This all changed in 2012, when one such researcher, Geoff Hinton, and his students, working here at the University of Toronto, published a seminal deep learning paper that cut error rates dramatically. I remember supporting Geoff's group's research computing at that time. It was a bit challenging: we were using multiple GPUs per machine to train machine learning models at a time when GPU computing was still rather new and somewhat unreliable. But GPUs were absolutely necessary: without them, instead of days of computing time to train a model, months would be required. One of our staff, Relu Patrascu, a computer scientist and skilled system administrator working hand-in-glove with the researchers, tuned and configured and babysat those machines as if they were sick children. But it worked! Suddenly deep learning could produce results closer to what people could do, and that was only the beginning. Since then, deep learning has produced terrific results in all sorts of domains, some exceeding what people can do, and we've not even scraped the surface of what is possible.

But what does deep learning actually do? It is a computer science data classification technique. It's used to take input data and classify it: give it a thing and it will figure out what the thing is. But it classifies things in a way that's different and more useful than traditional computer science methods for classification, such as computer programming, or data storage and retrieval (databases). As such, it can be used to do a lot more than computers previously had been able to do.

To see this, consider traditional computer science methods: for example, computer programming. This approach requires a person to write code that explicitly considers different cases. For example, imagine that you want to classify two-dimensional figures. You want to consider whether they are regular polygons. You could write a computer program that defines for itself what a regular polygon is, and checks each characteristic of an input shape to see whether or not it matches the definition of a regular polygon. Such a program, when given a square, will notice that it is a polygon, it has four sides, and that those sides are equal in length. Since the programmer put into the program a detailed definition of what a regular polygon is, and since the program checks each feature explicitly, it can tell whether or not a shape is a regular polygon, even if the program has never seen that particular shape before.

But what about exceptional cases? Is a circle a regular polygon? It is, after all, the limit of an N-gon as N goes to infinity. This is an "edge case" and programs need to consider those explicitly. A programmer had to anticipate this case and write it into the program. Moreover, if you wanted to consider some other type of shape, a programmer would have to rewrite the code accordingly. There's no going from a bunch of examples to working code without a programmer to write it. Programming is certainly a useful technique, but it has its limits. Wouldn't it be nice to be able to learn from a bunch of examples, without a person having to write all that code?

One way to do that would be data storage and retrieval, for example, a database. Consider the shape classifier problem again. You might put in a bunch of shapes into a database, indicating whether the shape is a regular polygon or not. Once the database is populated, classifying a shape simply becomes looking it up. The database will say whether or not it is a regular polygon.

But what if it's not there? A database has the advantage of being able to learn from examples. But it has a big disadvantage: if it hasn't seen an example before, and is asked about it, it has no idea what the right answer is. So while data storage and retrieval is a very useful computing technique, and it is the backbone of most of our modern information systems, it has its limits. Wouldn't it be nice if a classifier system could provide a useful answer for input data that it's never seen before, without a programmer to tell it how?

Deep learning does exactly this. Like data storage and retrieval, it learns from examples, through training. Very roughly, a neural network, when trained, is given some input data, and is told what output data it should produce when it sees that data in future. These input and output constraints propagate forward and backwards through the network, and are used to modify internal values such that when the network next sees input like that, it will produce the matching output.

The key advantage of this technique is that if it sees data that is similar to, but not the same as data it has been trained on, it will produce output similar to the trained output. This is very important, because like programming, it can work on input it has never seen, but like databases, it can learn from examples and need not be coded by a programmer anticipating all the details in advance. For our shape example, if trained with many examples of regular polygons, the neural network will be able to figure out whether or not a given input is a regular polygon, and perhaps even more interestingly, it will be able to note that a circle is very like a regular polygon, even if it had never been trained on a circle.

Moreover, a deep learning neural network can learn from its own results. This is called reinforcement learning. This technique involves using a neural network to derive output data from some input data, the results are tested to see how well they work, and the neural network is retrained accordingly. This way a neural network can "learn from its own mistakes", training itself iteratively to classify better. For example, a model of a walking human, with some simple programming to teach it the laws of physics, can, using reinforcement learning, teach itself how to walk. A few years ago, some of the researchers in our department did exactly that. Another example: Google got a lot of attention a few years ago when deep learning researchers there built a deep learning system that used reinforcement learning to become a champion at the game of Go, a game very hard to computerize using traditional techniques, and proved it by beating the reigning Go world champion.

It seems clear to me at this point that deep learning is as fundamental a computing technique as computer programming and databases in building practical computer systems. It is enormously powerful, and is causing a great deal of legitimate excitement. Like all computer science techniques, it has its advantages and drawbacks, but its strengths are where other computer science techniques have weaknesses, and so it is changing computer science (and data science more generally) in dramatic ways. It's an interesting time to be a computer scientist, and I can't even begin to imagine the many things that bright and innovative people will be able to do with it in the future.

/it permanent link