Slide 11 of lecture 6 shows a simple network: no hidden units, just one output unit (which is linear), and no biases. It has only two learnable parameters: w1 and w2. Below are some training sets that we could use with that network. For each of these training sets, describe the error surface:

1. Is there a setting of the weights where zero error is achieved? If so, what setting(s) is/are that?

2. Draw a contour plot of the error surface. Use w1 as your horizontal ('x') axis, and w2 as your vertical ('y') axis.

3. Describe it in English. Is it a quadratic bowl? Is it some other shape? Describe differences between the two axes.

This training set has only one case:
(0, 0) -> 5

This training set has only one case:
(0, 2) -> 10

This training set has only one case:
(20, 0) -> 100

This training set has two cases:
(0, 2) -> 10
(20, 0) -> 100

This training set has three cases:
(0, 2) -> 10
(20, 0) -> 100
(0, 0) -> 20

This training set has only one training case:
(0.01, 0.02) -> 70

This training set has three cases:
(0, 2) -> 10
(20, 0) -> 100
(0.01, 0.02) -> 70

This training set has two cases:
(0, 2) -> 10
(0, 3) -> 100