CSC321 Winter 2015: Introduction to Neural Networks

Announcements

We will post important class announcements on this page. Please check this page at least once per week.

**No lectures, tutorials, or office hours will be held for the duration of the CUPE 3902 Unit 1 strike.** You should have received an e-mail with further details.

We’ve just fixed a major typo in our Lecture 10 slides. In the slide on Dealing with exploding and vanishing gradients, it should read “Reverse the input or output sequence,” not “Reverse the training or test sequence.”

Thanks for giving us your feedback two weeks ago! We’re making a few changes to the second half of the course based on your responses:

1. A lot of you felt like the videos were too high-level or too hard to connect to the in-class lectures. A lot of you also requested more worked-through examples. We’ll have roughly 3 weeks worth of new content in the second half of the course, and for these weeks, we’ll provide targeted readings in place of the videos. These readings will introduce fewer concepts than the videos, but will be more mathematical and will include a lot more worked-through examples.

2. Some of you felt like the timeframe for Assignment 1 was too short. In the future, we’ll try to release the assignments 2 weeks ahead of the due date, but keep them at a similar difficulty level.

3. In addition, many of you felt like classes were too rushed, and there wasn’t enough time to finish the exercises. Some of you also requested seeing worked-through examples prior to doing the exercises yourself. The only solution to this would be to add more lecture time. We can do this by eliminating the tutorials and using the tutorial slot as a third hour of lecture per week. Of course, we expect not everyone would be happy with having more class time. Therefore, in the lecture following the midterm, we will hold a vote on whether to use the tutorials as a third hour of lecture. We’ll cover the same material either way, but having the extra hour would let us do it at a more leisurely pace.

As you know, we will have our midterm during lecture on Tuesday, Feb. 24. For the afternoon section it will be held from 1:10-2pm in the usual room (BA 1200). For the night section, it will be 6:10-7pm in the usual room (BA 1220).

First, the administrative details:

- The test will start at 1:10 or 6:10 sharp. If possible, please arrive early so that everyone can be seated with their test papers by the start time.
- The test is closed-book.
- Unless you have received permission from the instructors by e-mail, you must take the midterm in your assigned section. (The tests will be equivalent in coverage and difficulty, so there’s no advantage to taking one vs. the other.)
- Please bring photo ID (such as student ID). We will be checking.

The test will cover everything up to reading week:

- Coursera videos up through G (conv nets), except for the ones marked “optional”
- In-class lectures up through Lecture 12, “Recent advances in convolutional nets”
- Assignment 1

You are responsible for all of this material, but the hardest questions will be on things that were covered both in the videos and in class. The questions will be conceptual, and will generally ask you to justify your answers informally (rather than giving a formal proof).

Our in-class exercises have often involved long derivations, and most people didn’t quite have time to finish. For the test, by contrast, we’re intending that everyone has ample time to think carefully about and answer all the questions. There will be approximately 10 questions, some of which are short (1-3 sentence) explanations, and some of which are similar to individual steps of the in-class exercises.

In the mid-course survey, many of you indicated that it was hard to relate different course topics to each other, or the in-class lectures to the videos. It’s true that we’ve often focused on one piece of the puzzle at a time, since it’s hard to think about all the different viewpoints simultaneously. But the pieces do fit together, and a good way to study for the exam will be to try to understand these relationships. Try to think about how our in-class problems make precise some of the high-level arguments from the Coursera videos. See also our announcement from [1/24/15].

Because of the holiday, the instructor office hours for Monday, Feb. 16 are moved to Tuesday, Feb. 17, from 2-4pm, in PT290C (the usual room). In addition, we will hold extra office hours on Friday, Feb 20, from 3-5 in PT290C.

Here are two old midterms you can use to practice. You can also try old exams from the U of T library old exam collection, but many of the questions will correspond to topics we haven’t covered.

- The midterm from 2013. Note that A1, A2, A6, B2 (b), and B6 correspond to topics we haven’t covered.
- Questions from the 2014 midterm. (We don’t have an electronic copy of the original formatted exam.)

Due to a mix-up on our part, we didn’t hold this week’s tutorial for the night section, where we were planning to go over Assignment 1. However, we will still hold tutorial tomorrow for the afternoon section as originally planned, and will do the post-mortem for Assignment 1. If you are in the night section, you are welcome to attend. Attendance at tutorials has generally been sparse enough that we don’t expect to have problems with seating. This will be 12:10-12:30 in BA 1200.

We know some of you won’t be able to make it, so we’ll also send out the solutions and other comments to the class mailing list.

Unfortunately, we are not yet done preparing Assignment 2, so we won’t introduce it in tutorial this week. Instead, we’ll release it either this weekend or early next week, and we’ll also discuss it during lecture for Week 7. You will have plenty of time to do the assignment: even though you’ll have 2 weeks, it will be roughly the same length and difficulty as Assignment 1.

In the afternoon lectures, we’ve fallen about 15 minutes behind. (You’re asking lots of good questions, which is great!) To make up the lost time, Thursday’s lecture will start at 12:40, i.e. shortly after the tutorial.

There was a bug in the starter code which caused it not to be able to read in the data on Windows. If the code is working correctly for you, you can disregard this message.

The fixed version is now posted on the homework page. If you’ve already started, there’s no need to re-download: simply replace every occurrence of

`cPickle.load(open('something.pk'))`

with

`cPickle.load(open('something.pk', 'rb'))`

Assignment 1 is now posted in the homework page. This has you implement the backpropagation computations for the neural language model which we cover in Tuesday’s lecture. We’ll also give an introduction to the assignment in tutorial this week.

It’s due on Tuesday, Feb. 3, so you have just over a week to do it. The amount of code you need to write is very short, but the style of programming may be unfamiliar. Be sure to allow enough time.

We’ve just finished the first three weeks of the course, which built up to the backpropagation algorithm for training neural nets. This is a really powerful learning algorithm which we’ll be using throughout the course. The discussion so far has been somewhat abstract, but it will get more concrete in the next 3 weeks, as we switch to looking at particular examples of neural net applications.

This past week was probably one of the most difficult of the course, in terms of the number of new concepts introduced. We’ve looked at backpropagation from three different perspectives:

- the geometric perspective, where we visualized the level sets (contours) of the objective function as well as the gradient descent updates. This motivates why we compute the gradient, and also shows why it can behave badly when you choose too large a step size or when the level sets are elongated. (This geometric picture is part of why we spent so much time on weight space visualizations in Week 2.)
- the algebraic perspective, where we defined the gradient as the vector of partial derivatives with respect to weights and biases, and saw how we can compute these partial derivatives using backpropagation.
- the implementational perspective, where we rewrote the backpropagation computations in terms of matrix and vector operations so that we can implement the algorithm efficiently in Python.

It’s important that you understand all three perspectives and how they relate to each other. Throughout the course, we’ll be discussing issues which arise during training and which are best thought of through one of these perspectives. Being able to move between different levels of abstraction is part of being a good computer scientist.

We’ll also say a few words about the structure of the course, since our inverted classroom format is kind of unusual. Geoff Hinton has produced a very nice set of Coursera lecture videos which give a high-level intuition for the concepts in the course. This intuition might sometimes seem like magic, but really it comes from spending time playing around with the algorithms and working through simple cases by hand.

What we’re trying to do during class time is to give you the mathematical tools you need to reason about the algorithms yourself. We’re not trying to force you to learn twice as much material. Geoff’s lectures implicitly make use of concepts and problem solving techniques we cover during class, even if he doesn’t always talk about them explicitly. Therefore, we’d recommend reviewing the Coursera slides after we discuss the material in class. Hopefully you’ll be able to fill in the details the second time around, and not have to take everything on faith.

In terms of the Coursera quizzes, this is the first year that they’re due before (rather than after) the course meetings that cover the material. Even though they are open book and you have two attempts, they are still pretty challenging given that we haven’t yet discussed the material. The average second-attempt score on the quizzes so far has generally been between 70 and 90 percent. If it drops much below that, we’ll curve up the quiz scores — we certainly don’t want this component of the evaluation to be punitive.

During class, we give you challenging exercises which are a little bit beyond what you can do in the time provided. Most people don’t completely finish, and we don’t expect you to. Part of this, unfortunately, is the time frame: 50 minute classes are just a bit too short to work through 2 problems completely. But we also give challenging problems because that’s how you learn. You should discuss the problems with your neighbors, since that’s a good way to get un-stuck. Hopefully, even if you don’t finish, you will have made enough progress (or false starts) that the discussion afterward will make sense. Then you can review the solution at your leisure once the slides are posted.

We hope the inverted classroom format, with the quizzes and in-class problems, hasn’t made the course more stressful. The difference is that you’re getting feedback on your understanding throughout the term, rather than at the end, which forces you to stay on your toes. But hopefully this also has the effect that there’s less of a mad rush right before assignment deadlines and exams.

Don’t forget that we hold office hours Monday afternoons from 2 to 4. We encourage you to come if you have questions.

The slides for both of this week’s lectures and the tutorial are now posted on the calendar page. These slides include the solutions to all the problems we discussed during class, as well as clarifications based on questions people asked during class. The videos and quiz for next week are accessible through Coursera, and as usual there are some notes to supplement the lecture videos.

We’ve also posted some details on the homework page about installing and using Python and NumPy, as well as some recommended readings on Python/NumPy programming. You’re not going to need this until the first assignment, but it’s there in case you want to start preparing early.

The main theme of this week was the geometry of input space, feature space, and weight space. We’re going to use these spaces a lot throughout the course, so it’s very important that you be comfortable thinking in all of these spaces. Even though we’ll be using very high-dimensional spaces (e.g. thousands of dimensions) throughout the course, being able to sketch the 2-dimensional versions is an important source of intuition.

If you’ve forgotten how to plot linear inequalities in 2 dimensions, we’d recommend reviewing it on Khan Academy. Then go back to the slides from Week 2 (both in-class and Coursera) and try to understand how the algebra of the models and algorithms maps to the geometric representations we drew. If you’re stuck, please come to office hours on Monday — we’re going to build on this material in next week’s class meetings.

We’ve updated the slides for Lecture 2 to add some additional clarifications. If you’ve downloaded them before tonight, you’ll want to download the new version.

Welcome to CSC321!

All of you should now be able to access the Coursera page using your UTorID, including those of you who are still wait-listed. To make sure you’re looking at the correct session (rather than 2013 or 2014), **please check that the instructor list reads “Geoffrey Hinton, Roger Grosse, Nitish Srivastava”.** (If you tried to sign up earlier today, you may have been directed to the wrong session — if so, I apologize for the hiccup.) Please let us know (csc321prof [at] cs.toronto.edu) if you’re still having problems signing up.

The slides from the first two lectures have now been posted in the calendar page. In general, we will try to post the slides shortly after the class meetings.

If you want to request a prerequisite waiver, **you must e-mail us (csc321prof [at] cs.toronto.edu) by 11:59pm on Thursday.**