Let's run through the analysis with a concrete situation. There are only two input dimensions, so that we can draw everything on paper. There is no bias: the only learnable parameters are w1 and w2. There are only two data cases, to keep the story short. The first training case has input values (1, -2), and the correct output is 1. We're going to draw the weight space, as was done in the video. We're going to do it step by step. Step 1. Draw a vector to show the input values of the first training case. Draw it as a dot, with an arrow from the origin to that dot. Step 2. Draw the line that separates good weight vectors (which correctly classify the first training case) from bad weight vectors (which incorrectly classify the first training case). With an arrow, indicate which side is the "good" side. This good half-plane (halfspace) is where we'll want to go in the end. The second training case has input values (0, -1), and the correct output is 0. Step 3. Draw the vector of the second training case. Step 4. Draw the halfspace of weight vectors that are "good" for this second training case. Again, use an arrow to indicate which halfspace is the good one. Step 5. Now that we've drawn all training cases, indicate the "feasible region" of weight vectors. Step 6. Now we're going to train the perceptron. Let's say that the initial weight vector is (0, -2). The perceptron cycles through the training cases in order: first it looks at the first one, then at the second one, then again at the first one, etcetera. How does training proceed? What is the final weight vector? Reminder: data is (1,-2)->1 and (0,-1)->0 Answer: w1=2, w2=1. Step 7. Is the final weight vector in the feasible region? Is it in the generously feasible region? Theoretical question: what kind of situation has feasible weight vectors, but not generously feasible weight vectors?