{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CSC321 Tutorial 4: Multi-Class Classification with PyTorch\n",
"\n",
"In this tutorial, we'll go through an example of a multi-class\n",
"linear classification problem using PyTorch.\n",
"\n",
"Training models in PyTorch requires much less of the kind of code that you\n",
"are required to write for project 1.\n",
"However, PyTorch hides a lot of details of the computation,\n",
"both of the computation of the prediction, and the computation of the gradients. In your later\n",
"projects, you'll work with both numpy to understand deeply how your models actually work, but\n",
"also learn PyTorch to gain practical skills in building machine learning models.\n",
"\n",
"In the process, we will:\n",
"\n",
"- Introduce the MNIST dataset, which contains 28x28 pixel images of hand-written digits\n",
"- Introduce how to use of PyTorch to build and train models\n",
"- (If we have time) explore the effect of certain settings on our model:\n",
" - Data set size\n",
" - Batch size\n",
" - Regularization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"\n",
"The MNIST dataset contains black and white, hand-written (numerical) digits\n",
"that are 28x28 pixels large. This is a data set that is typically used for\n",
"demonstrations of machine learning models, and as a first data set to test\n",
"new types of models.\n",
"\n",
"We will download the dataset. For simplicity, we'll only use the first 2500\n",
"images in the MNIST dataset. The first time you run this code, we will download\n",
"the MNIST dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from torchvision import datasets\n",
"\n",
"# load the training data\n",
"mnist_train = datasets.MNIST('data', train=True, download=True)\n",
"mnist_train = list(mnist_train)[:2500]\n",
"\n",
"print(mnist_train[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a look at some of the data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot the first 18 images in the training data\n",
"import matplotlib.pyplot as plt\n",
"for k, (image, label) in enumerate(mnist_train[:18]):\n",
" plt.subplot(3, 6, k+1)\n",
" plt.imshow(image, cmap='gray')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PyTorch has code written for us to convert an image into numerical pixel features.\n",
"The tensor still preserves the 2D geometry of the image (we still get a `1x28x28` shape) \n",
"and does not yet flatten the image into a vector (to get a `1x784` shape) like we discussed\n",
"in lecture."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from torchvision import transforms\n",
"# transform the image data type to a 28x28 matrix of numbers\n",
"img_to_tensor = transforms.ToTensor()\n",
"\n",
"# convert the last image we saw into a tensor\n",
"img_tensor = img_to_tensor(image)\n",
"img_tensor.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we want to convert the entire dataset into these tensor representations (as opposed to\n",
"PIL.Image objects), there is a `transform` parameter that we can use when loading the MNIST\n",
"dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mnist_train = datasets.MNIST('data', train=True, transform=img_to_tensor)\n",
"mnist_train = list(mnist_train)[:2500]\n",
"print(mnist_train[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we'll split this data into training and validation, and start to build our model.\n",
"We won't need a test set for this tutorial, but in general we will also have a test set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mnist_train, mnist_val = mnist_train[:2000], mnist_train[2000:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Linear Model in PyTorch\n",
"\n",
"To build a linear model in PyTorch, we create an instance of the class `nn.Linear`,\n",
"and specify the number of input features, and the number of output features. For linear regression\n",
"and binary classification, the number of output features is 1. For multi-class classification,\n",
"we have as many outputs as there are classes.\n",
"\n",
"When using this model for classification, we'll need to apply the sigmoid or softmax\n",
"activiation *afterwards*. That is, this object is only meant to handle the linear part of the\n",
"model computation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn as nn\n",
"\n",
"example_model = nn.Linear(50, 1) # assume 50 features, 1 linear output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `example_model` object contains weights and biases of the model. By default, PyTorch\n",
"initializes these values to a random number close to 0:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"weight, bias = list(example_model.parameters())\n",
"print(weight)\n",
"print(weight.shape)\n",
"print(bias)\n",
"print(bias.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we create a new model, those initial parameters will change:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"example_model = nn.Linear(50, 1)\n",
"weight, bias = list(example_model.parameters())\n",
"\n",
"# These values should be different from above\n",
"print(weight)\n",
"print(weight.shape)\n",
"print(bias)\n",
"print(bias.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's create the actual model that we will train to solve the MNIST\n",
"digit classification problem. How many input features do we have? How many\n",
"output features do we need?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = nn.Linear(784, 10) # 784 = 28*28\n",
"\n",
"# Let's verify that the shapes of the weights and biases are what we expect\n",
"weight, bias = list(model.parameters())\n",
"print(weight.shape)\n",
"print(bias.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Making Predictions\n",
"\n",
"Let's see how we can make a prediction with this model. (You might find it strange that \n",
"we're talking about how to make predictions *before* talking about how to train the model.\n",
"The reason is that we will always train the model using a varient of gradient descent.\n",
"So you can imagine that the weights of this model will eventually become more meaningful\n",
"than it is now)\n",
"\n",
"We'll start with the simpler `example_model` first. The way that we make predictions\n",
"is by starting with an input $x$ that has the required shape. Since `example_model` is\n",
"just an example, we'll create a tensor with the appropriate shape, filled with random values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = torch.randn(50) # create a rank 1 tensor (vector) with 50 features\n",
"x.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To make predictions, we apply the `example_model` as if it is a function, with the \n",
"inputs as an argument:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y = example_model(x)\n",
"y.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If this model was used for binary classification, we might also need to apply the sigmoid \n",
"function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"torch.sigmoid(example_model(x))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One nice thing about PyTorch is that it vectorizes and parallelizes the computation for us.\n",
"So, if we had a *batch* of 32 inputs that we want to make predictions for, we can perform\n",
"that computation using a single call:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = torch.randn([32, 50]) # a stack of 32 inputs\n",
"print(x.shape)\n",
"y = example_model(x)\n",
"print(y.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(Note: The order of the dimensions in our input $x$ matters. The batch size always goes first,\n",
"and the number of features always goes second)\n",
"\n",
"Now, let's try and make some \"predictions\" with our MNIST model! We still have\n",
"the variable `image_tensor` from earlier:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"img_tensor.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, the shape of this tensor is not what we need it to be.\n",
"We need to *flatten* the image into either a rank 1 tensor (with shape [784])\n",
"or a rank 2 tensor (with shape [1, 784]). We'll choose the latter, so\n",
"that the transition to passing multiple images at the same time is easier:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = img_tensor.view(1, 784)\n",
"print(x.shape)\n",
"z = model(x)\n",
"print(z)\n",
"print(z.shape)\n",
"y = torch.softmax(z, dim=1)\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `dim=1` in the softmax tells PyTorch which dimension represents different\n",
"images, and which one represents the different class labels. We want our\n",
"outputs $y$ to be a probability distribution across the *classes*, and not\n",
"the different images.\n",
"\n",
"## Loss Function\n",
"\n",
"In order for the network to be useful, we need to actually train it, so\n",
"that the weights are actually meaningful, non-random values. As we mentioned\n",
"before, we'll use the network to make predictions, then compare the predictions\n",
"agains the ground truth via the loss function.\n",
"\n",
"PyTorch has standard loss functions that we can use: for example,\n",
"`nn.BCEWithLogitsLoss()` for a binary-classification problem, and a \n",
"`nn.CrossEntropyLoss()` for a multi-class classification problem like ours."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"criterion = nn.CrossEntropyLoss()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This criterion can also be called as a function. It takes the logit prediction and\n",
"ground-truth as parameters, and returns the loss. Two things to keep in mind\n",
"for this function:\n",
"\n",
"1. Loss functions like this usually takes the **logit** as parameter, rather than\n",
" the post-softmax probability distributions. This is for numerical stability.\n",
"2. This loss function also takes the ground-truth integer **index** as a parameter,\n",
" rather than a one-hot vector."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loss = criterion(y, torch.Tensor([8]).long()) # digit 8 = the 8-th class\n",
"print(loss)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Optimization and Weight Decay\n",
"\n",
"PyTorch also computes derivatives for us using *automatic differentiation*, which\n",
"we (might) talk about in this course. In short, we can specify an **optimizer**\n",
"(like Stochastic Gradient Descent), and use the optimizer to determine how to\n",
"update the weights."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch.optim as optim\n",
"optimizer = optim.SGD(model.parameters(), lr=0.005) # lr = learning rate\n",
"\n",
"# There are three lines of code required to perform \n",
"# a gradient descent update:\n",
"loss.backward() # compute updates for each parameter\n",
"optimizer.step() # make the updates for each parameter\n",
"optimizer.zero_grad() # a clean up step for PyTorch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also use weight decay (L2 regularization) in PyTorch through the optimizer:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"optimizer = optim.SGD(model.parameters(), lr=0.005, weight_decay=0.01)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Batching\n",
"\n",
"PyTorch data loader also does batching for us!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_loader = torch.utils.data.DataLoader(mnist_train,\n",
" batch_size=32, # batch size\n",
" shuffle=True) # shuffle before each epoch\n",
"\n",
"for (xs, ts) in enumerate(train_loader):\n",
" print(xs) # image pixels\n",
" print(ts) # targets\n",
" break\n",
"\n",
"# Try changing the batch_size above"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Putting it all together..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def run_gradient_descent(model,\n",
" batch_size=64,\n",
" learning_rate=0.01,\n",
" weight_decay=0,\n",
" num_epochs=10):\n",
" criterion = nn.CrossEntropyLoss()\n",
" optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay)\n",
"\n",
" iters, losses = [], []\n",
" iters_sub, train_acc, val_acc = [], [] ,[]\n",
"\n",
" train_loader = torch.utils.data.DataLoader(\n",
" mnist_train,\n",
" batch_size=batch_size,\n",
" shuffle=True)\n",
"\n",
" # training\n",
" n = 0 # the number of iterations\n",
" for epoch in range(num_epochs):\n",
" for xs, ts in iter(train_loader):\n",
" if len(ts) != batch_size:\n",
" continue\n",
" xs = xs.view(-1, 784) # flatten the image. The -1 is a wildcard\n",
" zs = model(xs)\n",
" loss = criterion(zs, ts) # compute the total loss\n",
" loss.backward() # compute updates for each parameter\n",
" optimizer.step() # make the updates for each parameter\n",
" optimizer.zero_grad() # a clean up step for PyTorch\n",
"\n",
" # save the current training information\n",
" iters.append(n)\n",
" losses.append(float(loss)/batch_size) # compute *average* loss\n",
"\n",
" if n % 10 == 0:\n",
" iters_sub.append(n)\n",
" train_acc.append(get_accuracy(model, mnist_train))\n",
" val_acc.append(get_accuracy(model, mnist_val))\n",
" # increment the iteration number\n",
" n += 1\n",
"\n",
" # plotting\n",
" plt.title(\"Training Curve (batch_size={}, lr={})\".format(batch_size, learning_rate))\n",
" plt.plot(iters, losses, label=\"Train\")\n",
" plt.xlabel(\"Iterations\")\n",
" plt.ylabel(\"Loss\")\n",
" plt.show()\n",
"\n",
" plt.title(\"Training Curve (batch_size={}, lr={})\".format(batch_size, learning_rate))\n",
" plt.plot(iters_sub, train_acc, label=\"Train\")\n",
" plt.plot(iters_sub, val_acc, label=\"Validation\")\n",
" plt.xlabel(\"Iterations\")\n",
" plt.ylabel(\"Accuracy\")\n",
" plt.legend(loc='best')\n",
" plt.show()\n",
"\n",
" return model\n",
"\n",
"def get_accuracy(model, data):\n",
" loader = torch.utils.data.DataLoader(data, batch_size=500)\n",
"\n",
" correct, total = 0, 0\n",
" for xs, ts in loader:\n",
" xs = xs.view(-1, 784) # flatten the image\n",
" zs = model(xs)\n",
" pred = zs.max(1, keepdim=True)[1] # get the index of the max logit\n",
" correct += pred.eq(ts.view_as(pred)).sum().item()\n",
" total += int(ts.shape[0])\n",
" return correct / total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try training this model!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = nn.Linear(784, 10)\n",
"run_gradient_descent(model, batch_size=64, learning_rate=0.01, num_epochs=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Things to try:\n",
"\n",
"- Changing the batch size\n",
"- Changing the weight decay parameter\n",
"- Reduce the size of the training set (+ weight decay)\n",
"- Changing the learning rate (for your project)"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}