{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# From Pigeons to Artificial Neural Networks\n",
    "\n",
    "Artificial neural networks draw inspiration from real neural networks.\n",
    "In order to understand deep learning using a \"top-down\"\n",
    "approach, we'll start with a familiar system capable of \n",
    "learning: a pigeon.\n",
    "\n",
    "In 2015, researchers Levenson et al. from the University of California Davis\n",
    "trained pigeons to detect breast cancer. They show that the common pigeon \n",
    "can reliably distinguish between benign versus malignant tumors.\n",
    "The experimental setup is shown below.\n",
    "\n",
    "<img src=\"imgs/pigeon.jpg\" width=\"500\" />\n",
    "\n",
    "The researchers trained pigeons by showing an image of a magnified\n",
    "biopsy to a pigeon. The pigeon then pecks at one of two answer\n",
    "buttons, labelling the image as malignant (cancerous) or benign \n",
    "(not cancerous). If the pigeon chooses correctly, researchers reward\n",
    "it with a tasty food pellet.\n",
    "\n",
    "You can imagine that at the very beginning, the pigeons might peck\n",
    "randomly, perhaps not even pecking at the buttons at all. Eventually,\n",
    "the pigeon might accidentally peck at the correct button, and see a\n",
    "food pellet. This food pellet is extremely important, and is what\n",
    "guides the pigeon to change its behaviour.\n",
    "\n",
    "In a sense, training an artificial neural network is like training\n",
    "a pigeon. In both cases, we need to answer questions like:\n",
    "\n",
    "1. How will we reward the pigeon (neural network)?\n",
    "2. How do we train the pigeon (neural network)\n",
    "   quickly and efficiently? \n",
    "3. How do we know that the pigeon (neural network) did not just\n",
    "   memorize the pictures we show it?\n",
    "4. Are there ethical issues in trusting a pigeon (neural network)\n",
    "   to detect cancer?\n",
    "  \n",
    "In this chapter, we'll build an artificial pigeon instead of using\n",
    "a real one. Also, instead of using pigeons to detect cancer, we'll\n",
    "work on a simpler problem of categorizing digits.\n",
    "In order to use an artificial pigeon to solve our classification\n",
    "problem, we will need to:\n",
    "\n",
    "1. Build an artificial pigeon -- or rather, an *artificial pigeon brain*\n",
    "2. Decide how to reward the artificial pigeon\n",
    "3. Decide how to train the artificial pigeon\n",
    "4. Determine how well our artificial pigeon performs the classification task\n",
    "\n",
    "We will look at each problem in turn.\n",
    "\n",
    "## The Pigeon Brain\n",
    "\n",
    "How does a pigeon \"work\"? How were pigeons able to link the visual features\n",
    "of the biopsy images to receiving the food pellets? We don't know everything\n",
    "about how pigeons work, but we do know this much:\n",
    "\n",
    "1. The light emitted from the screen showing the biopsy images reaches the pigeon's retina.\n",
    "2. Each retinal cell sends signals to neurons attached to it.\n",
    "3. The neurons pass the information to other neurons that are a part of the bird's brain.\n",
    "4. The brain makes a decision about what action to take.\n",
    "5. Neurons send signals to various parts of the pigeon's body, which results in the pigeon pecking a button (or not).\n",
    "6. The pigeon observes the response of its environment. Its neural pathways and biochemistry change to adapt to the environment.\n",
    "\n",
    "Much of these steps remain a mystery as of early 2019.\n",
    "However, we do know a few things about how biological neurons work, and we will\n",
    "use that knowledge to build a mathematical model of an artificial neuron.\n",
    "\n",
    "<img src=\"imgs/Blausen_0657_MultipolarNeuron.png\" width=\"500\" />\n",
    "\n",
    "The figure above shows the anatomy of a brain cell, called a **neuron**.\n",
    "For our purposes, the most important parts of the neuron are:\n",
    "\n",
    "- The **dendrites**, which are connected to other cells that provides information.\n",
    "- The **cell body**, which consolidates information from the dendrites.\n",
    "- The **axon**, which is an extension from the cell body that passes information to other cells.\n",
    "- The **synapse**, which is the area where the axon of one neuron and the dendrite of another connect.\n",
    "\n",
    "Neurons pass information using **action potentials**. When a neuron is \"at rest\",\n",
    "there is a small voltage difference between the inside and outside of the cell.\n",
    "When a neuron receives \"information\" in its dendrites, the voltage difference along\n",
    "that part of the cell lowers. If the total activity in a neuron's dendrites lowers\n",
    "the voltage difference enough, the entire cell *depolarizes*. In other words,\n",
    "the neuron **fires**. The voltage signal spreads along the axon and to the synapse,\n",
    "then to the next cells. Depending\n",
    "on the total activity at the next cells' dendrites, the signal might continue to\n",
    "propagate.\n",
    "\n",
    "What does it mean when a particular neuron fires? This question, known as\n",
    "**neural decoding**, is very difficult to answer.  However, neuroscientists do\n",
    "know that cells in the *optic pathway* fire more in response to various visual\n",
    "stimuli. There are neurons that fire more when a particular retinal cell is\n",
    "excited. There are also neurons that fire more in response to specific edges,\n",
    "lines, angles, and movements. There are neurons in monkeys that fire\n",
    "selectively to hands and faces. In 2005, studies found evidence of cells that\n",
    "fire in response to particular people like Bill Clinton or Jennifer Aniston.\n",
    "These studies lead to the hypothesis of a \"grandmother cell\", a neuron that\n",
    "represents a complex but specific concept or object.\n",
    "\n",
    "The existence of such \"grandmother cells\" is still contested. Many\n",
    "believe that neuron firing patterns encode information only in a **distributed** fashion.\n",
    "That is, the firing of a single neuron does not have a particular meaning, but\n",
    "the firing pattern of a group of neurons do. The idea is akin to how if you\n",
    "look at the bit patterns of an encrypted file, the individual bit values mean very\n",
    "little on their own, without considerations of other bits in the file.\n",
    "\n",
    "## An Artificial Pigeon Brain\n",
    "\n",
    "For the purpose of modelling an artificial brain, we will make \n",
    "make the simplifying assumption that a \"grandmother cell\" exists. This cell\n",
    "is the *output* of our network, representing the prediction we wish to make.\n",
    "That is, if we are hoping to classify malignant vs benign biopsy scans, we will have\n",
    "an output neuron that will (hopefully) only fire in the presence of a malignant tumor,\n",
    "not a benign one. Our goal when training this *artificial neural network* is \n",
    "to make this output neuron behave the way we want.\n",
    "\n",
    "We will also need to model neurons that read the *input*. In a biological brain,\n",
    "there are neurons that connect directly to retinal cells and activate when\n",
    "those cells activate. For our purposes, we will also have *input* neurons, one for\n",
    "each *pixel* in the image.\n",
    "\n",
    "The input neurons need to be connected to the output in some way. In a biological brain,\n",
    "the connections between neurons are messy, and can contain loops. In our case,\n",
    "we will use a \"layered\" network, like this:\n",
    "\n",
    "<img src=\"imgs/fcn.png\" width=\"200\" />\n",
    "\n",
    "Each neuron will belong to a \"layer\", with the input neurons belonging to\n",
    "the first layer, and the output neuron belonging to the last layer. Each\n",
    "neuron will be connected to other neurons in the layers below and above.\n",
    "In the image above, there is an input layer, a **hidden layer**,\n",
    "and an output layer. The hidden layer receives information from the input layer,\n",
    "and passes information the output layer. In theory, we can have as many hidden layers\n",
    "as we want. The more hidden layers we have, the **deeper** our network.\n",
    "\n",
    "This \"layered\" **neural network architecture** is called a **fully-connected,\n",
    "feed-forward network**. It is **fully-connected** because neurons connect to \n",
    "all other neurons in the preceding and succeeding layer. It is **feed-forward**\n",
    "because information only flows in one direction: there is no information flow\n",
    "from a later layer back to an earlier layer. For historical reasons, this\n",
    "network architecture is also called a **multi-layer perceptron** (MLP).\n",
    "\n",
    "We have still yet to model the individual neurons themselves. We will use real numbers\n",
    "to represent the firing rate of neurons, or the neuron's **activation**.\n",
    "We interpret a high activation value to mean a very active neuron, and a low\n",
    "activation value to mean an inactive neuron.\n",
    "The activation of the input neuron is set to the intensity of\n",
    "the corresponding pixel. \n",
    "\n",
    "The degree to which a biological neuron's activation influences another is\n",
    "complex, and depends on the presence and absence of neurotransmitters, and other\n",
    "biochemical and biophysical factors. In our artificial neuron, we will summarize\n",
    "all those factors into one real number, called a **weight**.\n",
    "This number can be large or small, and\n",
    "can be positive or negative. The strength of connections between neurons will\n",
    "change and adjust during training.\n",
    "\n",
    "So far, the activation of a neuron is a scaled sum of the activations of\n",
    "the neurons in the layer below. However, in a biological neuron, the total\n",
    "sum of activity in the dendrites needs to be above a threshold in order for\n",
    "the neuron to fire. We will model the (negative) threshold as another real\n",
    "value that we add (subtract) from the activation, and set all activation\n",
    "below zero to be zero. The threshold numbers, called *biases*, are different\n",
    "for each neuron, and will also change during training.\n",
    "\n",
    "Here is an implementation of the artificial pigeon brain in PyTorch.\n",
    "Don't worry if this code or the explanations don't make sense yet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "import matplotlib.pyplot as plt # for plotting\n",
    "\n",
    "torch.manual_seed(1) # set the random seed\n",
    "\n",
    "class Pigeon(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Pigeon, self).__init__()\n",
    "        self.layer1 = nn.Linear(28 * 28, 30)\n",
    "        self.layer2 = nn.Linear(30, 1)\n",
    "    def forward(self, img):\n",
    "        flattened = img.view(-1, 28 * 28)\n",
    "        activation1 = self.layer1(flattened)\n",
    "        activation1 = F.relu(activation1)\n",
    "        activation2 = self.layer2(activation1)\n",
    "        return activation2\n",
    "\n",
    "pigeon = Pigeon()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this network, there are 28x28 = 784 input neurons, suggesting that our\n",
    "input image should be 28x28 pixels large. We have a single output neuron.\n",
    "We also have 30 neurons in the hidden layer.\n",
    "\n",
    "The variable `pigeon.layer1` contains information about the connectivity\n",
    "between the input layer and the hidden layer (stored as a matrix), and the\n",
    "hidden layer biases (stored as a vector).\n",
    "Similarly, the variable `pigeon.layer2` contains information about the weights\n",
    "between the hidden layer and the output layer, and the output neuron's bias.\n",
    "The weights and biases adjust during training, so they are called the model's\n",
    "**parameters**.\n",
    "We can introspect their values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for w in pigeon.layer2.parameters():\n",
    "    print(w)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training\n",
    "\n",
    "The weights that we see above were randomly initialized by PyTorch. Most likely,\n",
    "they are not suitable for whatever task we want the network to perform. So,\n",
    "we need to \"train\" the network: to adjust these weights (and biases) so that\n",
    "the network *does* do what we want.\n",
    "\n",
    "Training is where our \"pigeon analogy\" falls apart. We posit that the\n",
    "biochemistry and connectivity of biological neurons can change based on\n",
    "past experience. However, the way that we train our artificial neural network\n",
    "bears no resemblance to the way pigeons and other animals might learn.\n",
    "\n",
    "Here's what training will entail:\n",
    "\n",
    "1. We're going to ask our network to make a prediction for some input data, whose output we *already know*.\n",
    "2. We're going to compare the predict output to the ground truth, actual output.\n",
    "3. We're going to *adjust the parameters* to make the prediction closer to the ground truth. (This is the magic behind machine learning.)\n",
    "4. We'll repeat steps 1-3. (The question of when to stop is an interesting one, which we won't talk about yet.)\n",
    "\n",
    "In order to train the network, we need a set of input data to which we know the desired output.\n",
    "\n",
    "\n",
    "## Digit Recognition\n",
    "\n",
    "We will train this \"artificial pigeon\" to perform a digit recognition\n",
    "task. That is, we will use the MNIST dataset of hand-written digits, and train\n",
    "the pigeon to **recognize a small digit, namely a digit that is less than 3**.\n",
    "This problem is a **binary classification problem** we want to predict\n",
    "which of two classes an input image is a part of.\n",
    "\n",
    "The MNIST dataset contains hand-written digits that are 28x28 pixels large.\n",
    "Here are a few digits in the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torchvision import datasets, transforms\n",
    "\n",
    "# load the training data\n",
    "mnist_train = datasets.MNIST('data', train=True, download=True)\n",
    "mnist_train = list(mnist_train)[:2000]\n",
    "\n",
    "# plot the first 18 images in the training data\n",
    "for k, (image, label) in enumerate(mnist_train[:18]):\n",
    "    plt.subplot(3, 6, k+1)\n",
    "    plt.imshow(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is an example of using the network to classify whether the last\n",
    "image contains a small digit. Again, since we haven't trained the network\n",
    "yet, the predicted probability of the image containing a small digit \n",
    "is close to half. The \"pigeon\" is unsure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# transform the image data type to a 28x28 matrix of numbers\n",
    "img_to_tensor = transforms.ToTensor()\n",
    "\n",
    "inval = img_to_tensor(image)\n",
    "outval = pigeon(inval)       # find the output activation given input\n",
    "prob = torch.sigmoid(outval) # turn the activation into a probability\n",
    "print(prob)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, if we show the network different images, the predicted probabilities\n",
    "will be very similar, and will be around half."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for k, (image, label) in enumerate(mnist_train[:10]):\n",
    "    print(torch.sigmoid(pigeon(img_to_tensor(image))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order for the network to be useful, we need to actually train it, so\n",
    "that the weights are actually meaningful, non-random values. As we mentioned\n",
    "before, we'll use the network to make predictions, then compare the predictions\n",
    "agains the ground truth. But how do we compare the predictions against\n",
    "the ground truth? We'll need a few more things. In particular, we need to:\n",
    "\n",
    "1. Specifying a \"reward\" or a **loss** (negative reward) that judges how good or bad the prediction was, compared to the **ground truth** actual value.\n",
    "2. Specifying an **optimizer** that tunes the **parameters** to improve the reward or loss.\n",
    "\n",
    "Choosing a good **loss function** $L(actual, predicted)$ \n",
    "for a problem is not a trivial task.\n",
    "The definition of a loss function also transforms a **classification problem**\n",
    "into an **optimization** problem: what set of parameters\n",
    "minimizes the loss (or maximizes the reward) across the training examples?\n",
    "\n",
    "Turning a learning problem into an optimization problem\n",
    "is actually a very subtle but important step in many machine learning tools,\n",
    "because it allows us to use tools from mathematical optimization.\n",
    "\n",
    "That there are **optimizers** that can tune the network parameters for\n",
    "us is also really, really cool. Unfortunately, we won't talk much about\n",
    "optimizers and how they work in this course. You should know though that\n",
    "these optimizers are what makes machine learning work at all.\n",
    "\n",
    "For now, we will choose a standard loss function for a binary classification\n",
    "problem: the **binary cross-entropy loss**. We'll also choose\n",
    "a **stochastic gradient descent** optimizer. We'll talk about\n",
    "what these mean later in the course."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.optim as optim\n",
    "\n",
    "criterion = nn.BCEWithLogitsLoss()\n",
    "optimizer = optim.SGD(pigeon.parameters(), lr=0.005, momentum=0.9)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can start to train the pigeon network, similar to the way we would train\n",
    "a real pigeon:\n",
    "\n",
    "1. We'll show the network pictures of digits, one by one\n",
    "2. We'll see what the network predicts\n",
    "3. We'll check the loss function for that example digit, comparing the network prediction against the ground truth\n",
    "4. We'll make a small update to the parameters to try and improve the loss for that digit\n",
    "5. We'll continue doing this many times -- let's say 1000 times\n",
    "\n",
    "For simplicity, we'll use 1000 images, and show the network each image only once."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# simplified training code to train `pigeon` on the \"small digit recognition\" task\n",
    "\n",
    "for (image, label) in mnist_train[:1000]:\n",
    "    # actual ground truth: is the digit less than 3?\n",
    "    actual = (label < 3).reshape([1,1]).type(torch.FloatTensor)\n",
    "    # pigeon prediction\n",
    "    out = pigeon(img_to_tensor(image)) # step 1-2\n",
    "    # update the parameters based on the loss\n",
    "    loss = criterion(out, actual)      # step 3\n",
    "    loss.backward()                    # step 4 (compute the updates for each parameter)\n",
    "    optimizer.step()                   # step 4 (make the updates for each parameter)\n",
    "    optimizer.zero_grad()              # a clean up step for PyTorch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see how the pigeon performs on the last image it was trained on:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# display the last training image\n",
    "plt.imshow(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's the predicted probability that the image is a small digit (less than 3)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inval = img_to_tensor(image)\n",
    "outval = pigeon(img_to_tensor(image))\n",
    "prob = torch.sigmoid(outval)\n",
    "print(prob)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are some more predictions for some of the digits we plotted eariler:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# predictions for the first 10 digits (we plotted the first 18 earlier)\n",
    "\n",
    "for (image, label) in mnist_train[:10]:\n",
    "    prob = torch.sigmoid(pigeon(img_to_tensor(image)))\n",
    "    print(\"Digit: {}, Predicted Prob: {}\".format(label, prob))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not bad! We'll use the probability 50% as the cutoff for making a \n",
    "discrete prediction. Then, we can compute the accuracy on the 1000\n",
    "images we used to train the network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# computing the error and accuracy on the training set\n",
    "\n",
    "error = 0\n",
    "for (image, label) in mnist_train[:1000]:\n",
    "    prob = torch.sigmoid(pigeon(img_to_tensor(image)))\n",
    "    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):\n",
    "        error += 1\n",
    "print(\"Training Error Rate:\", error/1000)\n",
    "print(\"Training Accuracy:\", 1 - error/1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The accuracy on those 1000 images is 96%, which is really good considering\n",
    "that we only showed the network each image only once.\n",
    "\n",
    "However, this accuracy is not representative of how well the network is doing,\n",
    "because the network was *trained* on the data. The network had a chance to\n",
    "see the actual answer, and learn from that answer. To get a better sense of\n",
    "the network's predictive accuracy, we should compute accuracy numbers on\n",
    "a **test set**: a set of images that were not seen in training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# computing the error and accuracy on a test set\n",
    "\n",
    "error = 0\n",
    "for (image, label) in mnist_train[1000:2000]:\n",
    "    prob = torch.sigmoid(pigeon(img_to_tensor(image)))\n",
    "    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):\n",
    "        error += 1\n",
    "print(\"Test Error Rate:\", error/1000)\n",
    "print(\"Test Accuracy:\", 1 - error/1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The test error rate is double the training error rate!\n",
    "\n",
    "## Overfitting\n",
    "\n",
    "To further illustrate the importance of having a separate **test set**,\n",
    "let's build another identical network `pigeon2`, but train it\n",
    "differently. This network `pigeon2` will be trained on just 10 images.\n",
    "We show the network each image 100 times so that we have the same number\n",
    "of overall training steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define another network:\n",
    "pigeon2 = Pigeon()\n",
    "# define an optimizer:\n",
    "optimi2 = optim.SGD(pigeon2.parameters(), lr=0.005, momentum=0.9)\n",
    "\n",
    "# training:\n",
    "for i in range(100):                        # repeat 100x\n",
    "    for (image, label) in mnist_train[:10]: # use the first 10 images to train\n",
    "        actual = (label < 3).reshape([1,1]).type(torch.FloatTensor)\n",
    "        out = pigeon2(img_to_tensor(image))\n",
    "        loss = criterion(out, actual)\n",
    "        loss.backward()\n",
    "        optimi2.step()\n",
    "        optimi2.zero_grad()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's check the accuracy on those 10 images:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# computing the error and accuracy on the training set\n",
    "\n",
    "error = 0\n",
    "for (image, label) in mnist_train[:10]:\n",
    "    prob = torch.sigmoid(pigeon2(img_to_tensor(image)))\n",
    "    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):\n",
    "        error += 1\n",
    "print(\"Training Error Rate:\", error/10)\n",
    "print(\"Training Accuracy:\", 1 - error/10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Look, we achieve perfect accuracy on those 10 images!\n",
    "But if we look at the test accuracy on the same test set as we used earlier,\n",
    "we do much worse."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# computing the error and accuracy on the test set\n",
    "\n",
    "error = 0\n",
    "for (image, label) in mnist_train[1000:2000]:\n",
    "    prob = torch.sigmoid(pigeon2(img_to_tensor(image)))\n",
    "    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):\n",
    "        error += 1\n",
    "print(\"Test Error Rate:\", error/1000)\n",
    "print(\"Test Accuracy:\", 1 - error/1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, if we continue training on those 10 images,\n",
    "our test accuracy could actually **decrease**!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# train the network `pigeon2` for a bit longer\n",
    "# show each of the 10 images 50x\n",
    "\n",
    "for i in range(50):\n",
    "    for (image, label) in mnist_train[:10]:\n",
    "        actual = (label < 3).reshape([1,1]).type(torch.FloatTensor)\n",
    "        out = pigeon2(img_to_tensor(image))\n",
    "        loss = criterion(out, actual)\n",
    "        loss.backward()\n",
    "        optimi2.step()\n",
    "        optimi2.zero_grad()\n",
    "\n",
    "error = 0\n",
    "for (image, label) in mnist_train[1000:2000]:\n",
    "    prob = torch.sigmoid(pigeon2(img_to_tensor(image)))\n",
    "    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):\n",
    "        error += 1\n",
    "print(\"Test Error Rate:\", error/1000)\n",
    "print(\"Test Accuracy:\", 1 - error/1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of learning to identify image features that will **generalize**\n",
    "to unseen examples, the network is instead \"memorizing\" the training images.\n",
    "We say that a model **overfits** when it is learning the\n",
    "idiosyncrasies of the training set, rather than the features that\n",
    "generalize beyond the dataset.\n",
    "\n",
    "## Ethics of Using (real or artificial) Pigeons\n",
    "\n",
    "One can hardly imagine having a pigeon actually diagnose patients. One reason\n",
    "is that the authors of the study (Levenson et al.) did not show that pigeons outperform doctors. Doctors,\n",
    "after all, have much larger brains than pigeons, and are trained using much\n",
    "more than food pellets.\n",
    "\n",
    "Moreover, one can ask doctors to justify their reasoning\n",
    "and their decisions. There is no way to ask a pigeon why it chose to peck\n",
    "one button and not the other.\n",
    "Likewise, we don't know why the neural network\n",
    "made a particular decision, so there is no way of verifying whether its reasoning \n",
    "is correct.\n",
    "The lack of interpretability of artificial neural networks is one of the many\n",
    "reasons preventing its more widespread use.\n",
    "\n",
    "## Beyond Biological Pigeons\n",
    "\n",
    "Although biological pigeons motivated our network,\n",
    "training an artificial network is\n",
    "nothing like the kind of \"learning\" that might be biologically plausible.\n",
    "Artificial neural networks also do not have the same constraints as biological\n",
    "pigeons.\n",
    "For one, we can actually show a **batch** containing multiple images\n",
    "to the network at the same time, and only take one optimizer step for the\n",
    "entire batch. We can also train many different models, each with different\n",
    "settings, and choose the model that works the best.\n",
    "We will discuss these training considerations in the next few lectures."
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 2
}