{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Classification in PyTorch\n",
"\n",
"In this section, we're going to look at actually how\n",
"to define and debug a neural network in PyTorch. We will\n",
"also take the opportunity to go beyond a binary classification problem,\n",
"and instead work on a more general classification problem\n",
"\n",
"Let's start, as always, with our neural network model from last time."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"\n",
"class Model(nn.Module):\n",
" def __init__(self, num_hidden):\n",
" super(Model, self).__init__()\n",
" self.layer1 = nn.Linear(28 * 28, num_hidden)\n",
" self.layer2 = nn.Linear(num_hidden, 1)\n",
" self.num_hidden = num_hidden\n",
" def forward(self, img):\n",
" flattened = img.view(-1, 28 * 28)\n",
" activation1 = self.layer1(flattened)\n",
" activation1 = F.relu(activation1)\n",
" activation2 = self.layer2(activation1)\n",
" return activation2\n",
"\n",
"model = Model(30)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The module `torch.nn` contains different classess that help you build\n",
"neural network models. All models in PyTorch inherit from the subclass `nn.Module`,\n",
"which has useful methods like `parameters()`, `__call__()` and others.\n",
"\n",
"This module `torch.nn` also has various *layers* that you can use to build\n",
"your neural network. For example, we used `nn.Linear` in our code above, which\n",
"constructs a fully connected layer. In particular, we defined two `nn.Linear`\n",
"layers as part of our network in the `__init__` method. Next week, we'll start\n",
"to see other types of layers like `nn.Conv2d`.\n",
"\n",
"(What exactly is a \"layer\"? It is essentially a step in the neural network computation.\n",
"We can also think of the ReLU activation as a \"layer\". However, there are no tunable\n",
"parameters associated with the ReLU activation function. We don't need to keep\n",
"track of \"states\" associated with the ReLU acitvation, so it is not initalized\n",
"as a \"layer\" in the `__init__` function.)\n",
"\n",
"\n",
"The `__init__` method is where we typically define the attributes of a class.\n",
"In our case, all the \"sub-components\" of our model should be defined here, along with\n",
"any other setting that we wish to save -- for example `self.num_hidden`.\n",
"\n",
"The `forward` method is called when we use the neural network to make a prediction.\n",
"Another term for \"making a prediction\" is **running the forward pass**, \n",
"because information flows *forward* from the input through the hidden layers to the output.\n",
"When we compute parameter updates,\n",
"we run the **backward pass** by calling the function `loss.backward()`. During the backward\n",
"pass, information about parameter changes flows *backwards*, from the output through the\n",
"hidden layers to the input.\n",
"\n",
"The `forward` method is called from the `__call__` function of `nn.Module`,\n",
"so that when we run `model(input)`, the `forward` method is called.\n",
"\n",
"\n",
"In our case, the `forward` function does the following:\n",
"\n",
"1. \"Flatten\" the input parameter `img`. The parameter `img` is a PyTorch tensor of dimension `batch_size x 28 x 28`, or `[-1, 28, 28]` (or possibly `[-1, 1, 28, 28]`). The dimension size `-1` is a placeholder for a \"unknown\" dimension size. After flattening, the variable `flattened` will be a PyTorch tensor of dimension `[-1, 28*28]`.\n",
"2. Run the forward pass of `self.layer1`, which computes activations of our hidden layer given our flattened input.\n",
"3. Pass those activations (`activation1`) through the ReLU nonlinearity.\n",
"4. Run the forward pass of `self.layer2`, which computes activations of our output layer given `activation2`.\n",
"\n",
"Note that in the last few classes, we have used the *sigmoid* activation function to turn the final `activation2`\n",
"value into a probability. This step is **not** a part of the `forward` method. The reason is that the computation\n",
"of the loss function is more numerically stable when we don't run the `sigmoid` function (we get a more\n",
"accurate loss function value because of the way floating-point values are represented on the computer).\n",
"\n",
"## Define a Neural Network\n",
"\n",
"To define our own neural network, we should understand the inputs and outputs that are expected.\n",
"For a **binary-classification problem**, our output can be a single neuron. We should then decide\n",
"on the architecture(s) that we want. How many layers should we have? How many neurons in each layer?\n",
"And later on -- what kind of layers will we use?\n",
"\n",
"Here is an example of a 4-layer neural network that performs binary classification on a 28x28 image."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class Model(nn.Module):\n",
" def __init__(self, num_hidden):\n",
" super(Model, self).__init__()\n",
" self.layer1 = nn.Linear(28 * 28, 100)\n",
" self.layer2 = nn.Linear(100, 50)\n",
" self.layer3 = nn.Linear(50, 20)\n",
" self.layer4 = nn.Linear(20, 1)\n",
" self.num_hidden = num_hidden\n",
" def forward(self, img):\n",
" flattened = img.view(-1, 28 * 28)\n",
" activation1 = F.relu(self.layer1(flattened))\n",
" activation2 = F.relu(self.layer2(activation1))\n",
" activation3 = F.relu(self.layer3(activation2))\n",
" output = self.layer4(activation3)\n",
" return output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that in a fully-connected feed-forward network, the number of units in each layer always\n",
"decreases. The neural network is forced to *condense* information, step-by-step, until it computes\n",
"the target output we desire. When solving prediction problems, we will rarely (if ever) have a later\n",
"layer have more neurons than a previous layer.\n",
"\n",
"## N-ary Classification\n",
"\n",
"For the rest of this chapter, let's work on a slightly different classification problem. Instead of\n",
"a binary classification problem, we will work on a general classification problem, where an input value\n",
"will be classified into one of many categories.\n",
"\n",
"We will perform the digit classification problem: given an image of a hand-written digit, we will\n",
"predict what digit the image represents. We are already familiar with the MNIST data, but here it is again:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"from torchvision import datasets, transforms\n",
"\n",
"mnist_images = datasets.MNIST('data', train=True, download=True)\n",
"\n",
"for k, (image, label) in enumerate(mnist_images):\n",
" if k >= 18:\n",
" break\n",
" plt.subplot(3, 6, k+1)\n",
" plt.imshow(image)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use 4096 training images, and 1024 validation images. (Normally, when we train\n",
"neural networks, we will try to use all the data that we have. The only reason I'm limiting\n",
"our training and validation set is so that the code runs quickly for demonstration purposes.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mnist_data = datasets.MNIST('data', train=True, transform=transforms.ToTensor())\n",
"mnist_data = list(mnist_data)\n",
"\n",
"mnist_train = mnist_data[:4096]\n",
"mnist_val = mnist_data[4096:5120]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our network will be a 3-layer neural network. Our input size is still 28x28, but our output\n",
"cannot be a single neuron any more! Instead, we will use 10 output neurons, one representing\n",
"each of the 10 digits. Our architecture will look like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class MNISTClassifier(nn.Module):\n",
" def __init__(self):\n",
" super(MNISTClassifier, self).__init__()\n",
" self.layer1 = nn.Linear(28 * 28, 50)\n",
" self.layer2 = nn.Linear(50, 20)\n",
" self.layer3 = nn.Linear(20, 10)\n",
" def forward(self, img):\n",
" flattened = img.view(-1, 28 * 28)\n",
" activation1 = F.relu(self.layer1(flattened))\n",
" activation2 = F.relu(self.layer2(activation1))\n",
" output = self.layer3(activation2)\n",
" return output\n",
"\n",
"model = MNISTClassifier()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we run the *forward pass* -- or attempt to make predictions, we will \n",
"get something like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"first_img, first_label = mnist_train[0]\n",
"output = model(first_img)\n",
"print(output)\n",
"print(output.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The tensor `output` shows the activation of the 10 output neurons in our neural network.\n",
"We still need to go from this output to either a (discrete) prediction, or a (continuous)\n",
"distribution showing a computed probability of the image belonging to each class (each digit).\n",
"The latter is more general, and is necessary when we define an optimizable loss function,\n",
"so let's talk about computing continuous probabilities.\n",
"\n",
"In the case of binary classification, we used the sigmoid function to turn an output\n",
"activation into a probability value between 0 and 1. In the n-ary case, we use the\n",
"multivariate analog of the sigmoid function called the `softmax`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"prob = F.softmax(output, dim=1)\n",
"print(prob)\n",
"print(sum(prob[0]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since `output` is a tensor of dimension `[1, 10]`, we need to tell PyTorch that we want\n",
"the softmax computed over the right-most dimension. This is necessary because\n",
"like most PyTorch functions, `F.softmax` can compute softmax probabilities for a\n",
"mini-batch of data. We need to clarify which dimension represents the different classes,\n",
"and which dimension represents different data points.\n",
"\n",
"(As an aside, compare the difference in setting `dim=0` vs `dim=1` below:)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"F.softmax(torch.tensor([[1,1.],[3,4.]]), dim=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"F.softmax(torch.tensor([[1,1.],[3,4.]]), dim=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loss\n",
"\n",
"In our binary classification examples, we used the **binary cross-entropy loss**.\n",
"For general classification, we will use the more general **cross-entropy loss**,\n",
"and the same optimizer as before."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch.optim as optim\n",
"\n",
"def train(model, data, batch_size=64, num_epochs=1):\n",
" train_loader = torch.utils.data.DataLoader(data, batch_size=batch_size)\n",
" criterion = nn.CrossEntropyLoss()\n",
" optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)\n",
"\n",
" iters, losses, train_acc, val_acc = [], [], [], []\n",
"\n",
" # training\n",
" n = 0 # the number of iterations\n",
" for epoch in range(num_epochs):\n",
" for imgs, labels in iter(train_loader):\n",
" out = model(imgs) # forward pass\n",
" loss = criterion(out, labels) # compute the total loss\n",
" loss.backward() # backward pass (compute parameter updates)\n",
" optimizer.step() # make the updates for each parameter\n",
" optimizer.zero_grad() # a clean up step for PyTorch\n",
"\n",
" # save the current training information\n",
" iters.append(n)\n",
" losses.append(float(loss)/batch_size) # compute *average* loss\n",
" train_acc.append(get_accuracy(model, train=True)) # compute training accuracy \n",
" val_acc.append(get_accuracy(model, train=False)) # compute validation accuracy\n",
" n += 1\n",
"\n",
" # plotting\n",
" plt.title(\"Training Curve\")\n",
" plt.plot(iters, losses, label=\"Train\")\n",
" plt.xlabel(\"Iterations\")\n",
" plt.ylabel(\"Loss\")\n",
" plt.show()\n",
"\n",
" plt.title(\"Training Curve\")\n",
" plt.plot(iters, train_acc, label=\"Train\")\n",
" plt.plot(iters, val_acc, label=\"Validation\")\n",
" plt.xlabel(\"Iterations\")\n",
" plt.ylabel(\"Training Accuracy\")\n",
" plt.legend(loc='best')\n",
" plt.show()\n",
"\n",
" print(\"Final Training Accuracy: {}\".format(train_acc[-1]))\n",
" print(\"Final Validation Accuracy: {}\".format(val_acc[-1]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And of course, we need the `get_accuracy` helper function. To turn the probabilities\n",
"into a discrete prediction, we will take the digit with the highest probability.\n",
"Because of the way softmax is computed, the digit with the highest probability is\n",
"the same as the digit with the (pre-activation) output value."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_acc_loader = torch.utils.data.DataLoader(mnist_train, batch_size=4096)\n",
"val_acc_loader = torch.utils.data.DataLoader(mnist_val, batch_size=1024)\n",
"\n",
"def get_accuracy(model, train=False):\n",
" if train:\n",
" data = mnist_train\n",
" else:\n",
" data = mnist_val\n",
"\n",
" correct = 0\n",
" total = 0\n",
" for imgs, labels in torch.utils.data.DataLoader(data, batch_size=64):\n",
" output = model(imgs) # We don't need to run F.softmax\n",
" pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability\n",
" correct += pred.eq(labels.view_as(pred)).sum().item()\n",
" total += imgs.shape[0]\n",
" return correct / total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Debugging the Neural Network\n",
"\n",
"One technique that researchers often use to debug their network is to first\n",
"make sure that their network can overfit to a small dataset. This sanity check\n",
"ensures that you are using the right variable names, and rules out other\n",
"programming bugs that are difficult to discern from architecture issues.\n",
"\n",
"Common programming issues that can arise include:\n",
"\n",
"* Forgetting to call `optimizer.zero_grad()` when using PyTorch. In general, this line of code is included\n",
" at the beginning of the code for a training iteration, as opposed to at the end.\n",
"* Using the wrong `criterion`, or using a loss function with incorrectly formated variables.\n",
"* Adding a non-linearity after the final layer. In general we don't add a non-linearity in the `forward`\n",
" function of the network, so that the computation of the loss function and the associated optimization\n",
" steps are more numerically stable.\n",
"* Forgetting non-linearity layers in the `forward` function.\n",
"\n",
"Let's see if our network can overfit relatively quickly to a small dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"debug_data = mnist_train[:64]\n",
"model = MNISTClassifier()\n",
"train(model, debug_data, num_epochs=100)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Only when we have ensured that our model can overfit to a small dataset do we\n",
"begin training the neural network our full training set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = MNISTClassifier()\n",
"train(model, mnist_train, num_epochs=5)\n",
"\n",
"# save the model for next time\n",
"torch.save(model.state_dict(), \"saved_model\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At this point, we can begin tuning hyperparameters, and tweak the architecture of our network\n",
"to improve our validation accuracy. We can also check for any underfitting or overfitting."
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}