{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CSC321H5 Project 2.\n",
"\n",
"**Deadline**: Thursday, Feb. 13, by 9pm\n",
"\n",
"**Submission**: Submit a PDF export of the completed notebook. \n",
"\n",
"**Late Submission**: Please see the syllabus for the late submission criteria.\n",
"\n",
"Based on an assignment by George Dahl, Jing Yao Li, and Roger Grosse\n",
"\n",
"In this assignment, we will make a neural network that can predict the next word\n",
"in a sentence given the previous three. \n",
"We'll explore a couple of different models to perform this prediction task. We will also do this\n",
"problem twice: once in PyTorch, and once using numpy. When using numpy, you'll implement\n",
"the backpropagation computation.\n",
"\n",
"In doing this prediction task, our neural networks will learn about *words* and about\n",
"how to represent words. We'll explore the *vector representations* of words that our\n",
"model produces, and analyze these representations.\n",
"\n",
"You may modify the starter code as you see fit, including changing the signatures of\n",
"functions and adding/removing helper functions. However, please make sure that your\n",
"TA can understand what you are doing and why."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 1. Data\n",
"\n",
"With any machine learning problem, the first thing that we would want to do\n",
"is to get an intuitive understanding of what our data looks like. Download the file\n",
"`raw_sentences.txt` from `https://www.cs.toronto.edu/~lczhang/321/hw/raw_sentences.txt`\n",
"and upload it to Google Drive.\n",
"Then, mount Google Drive from your Google Colab notebook:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/gdrive')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Find the path to `raw_sentences.txt`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"file_path = '/content/gdrive/My Drive/CSC321/raw_sentences.txt' # TODO - UPDATE ME!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You might find it helpful to know that you can run shell commands (like `ls`) by\n",
"using `!` in Google Colab, like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!ls /content/gdrive/My\\ Drive/\n",
"!mkdir /content/gdrive/My\\ Drive/CSC321"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following code reads the sentences in our file, split each sentence into\n",
"its individual words, and stores the sentences (list of words) in the\n",
"variable `sentences`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sentences = []\n",
"for line in open(file_path):\n",
" words = line.split()\n",
" sentence = [word.lower() for word in words]\n",
" sentences.append(sentence)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are 97,162 sentences in total, and \n",
"these sentences are composed of 250 distinct words."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vocab = set([w for s in sentences for w in s])\n",
"print(len(sentences)) # 97162\n",
"print(len(vocab)) # 250"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll separate our data into training, validation, and test.\n",
"We'll use `10,000 sentences for test, 10,000 for validation, and\n",
"the rest for training."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test, valid, train = sentences[:10000], sentences[10000:20000], sentences[20000:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (a) -- 2 pts\n",
"\n",
"Display 10 sentences in the training set.\n",
"Explain how punctuations are treated in our word representation, and how words\n",
"with apostrophes are represented."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Your code goes here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) -- 2 pts\n",
"\n",
"What are the 10 most common words in the vocabulary? How often does each of these\n",
"words appear in the training sentences? Express the second quantity a percentage\n",
"(i.e. number of occurences of the word / total number of words in the training set).\n",
"\n",
"These are good quantities to compute, because one of the first things a machine learning\n",
"model will learn is to predict the **most common** class. Getting a sense of the\n",
"distribution of our data will help you understand our model's behaviour.\n",
"\n",
"You can use Python's `collections.Counter` class if you would like to."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Your code goes here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c) -- 4 pts\n",
"\n",
"Complete the helper functions `convert_words_to_indices` and\n",
"`generate_4grams`, so that the function `process_data` will take a \n",
"list of sentences (i.e. list of list of words), and generate an \n",
"$$N \\times 4$$ numpy matrix containing indices of 4 words that appear\n",
"next to each other. You can use the constances `vocab`, `vocab_itos`,\n",
"and `vocab_stoi` in your code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# A list of all the words in the data set. We will assign a unique \n",
"# identifier for each of these words.\n",
"vocab = sorted(list(set([w for s in train for w in s])))\n",
"# A mapping of index => word (string)\n",
"vocab_itos = dict(enumerate(vocab))\n",
"# A mapping of word => its index\n",
"vocab_stoi = {word:index for index, word in vocab_itos.items()}\n",
"\n",
"def convert_words_to_indices(sents):\n",
" \"\"\"\n",
" This function takes a list of sentences (list of list of words)\n",
" and returns a new list with the same structure, but where each word\n",
" is replaced by its index in `vocab_stoi`.\n",
"\n",
" Example:\n",
" >>> convert_words_to_indices([['one', 'in', 'five', 'are', 'over', 'here'], ['other', 'one', 'since', 'yesterday'], ['you']])\n",
" [[148, 98, 70, 23, 154, 89], [151, 148, 181, 246], [248]]\n",
" \"\"\"\n",
"\n",
" # Write your code here\n",
"\n",
"def generate_4grams(seqs):\n",
" \"\"\"\n",
" This function takes a list of sentences (list of lists) and returns\n",
" a new list containing the 4-grams (four consequentively occuring words)\n",
" that appear in the sentences. Note that a unique 4-gram can appear multiple\n",
" times, one per each time that the 4-gram appears in the data parameter `seqs`.\n",
"\n",
" Example:\n",
"\n",
" >>> generate_4grams([[148, 98, 70, 23, 154, 89], [151, 148, 181, 246], [248]])\n",
" [[148, 98, 70, 23], [98, 70, 23, 154], [70, 23, 154, 89], [151, 148, 181, 246]]\n",
" >>> generate_4grams([[1, 1, 1, 1, 1]])\n",
" [[1, 1, 1, 1], [1, 1, 1, 1]]\n",
" \"\"\"\n",
"\n",
" # Write your code here\n",
"\n",
"def process_data(sents):\n",
" \"\"\"\n",
" This function takes a list of sentences (list of lists), and generates an\n",
" numpy matrix with shape [N, 4] containing indices of words in 4-grams.\n",
" \"\"\"\n",
" indices = convert_words_to_indices(sents)\n",
" fourgrams = generate_4grams(indices)\n",
" return np.array(fourgrams)\n",
"\n",
"train4grams = process_data(train)\n",
"valid4grams = process_data(valid)\n",
"test4grams = process_data(test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 2. A Multi-Layer Perceptron\n",
"\n",
"In this section, we will build a two-layer multi-layer perceptron.\n",
"We will first do this in numpy, and then once more in PyTorch.\n",
"Our model will look like this:\n",
"\n",
"\n",
"\n",
"Start by reviewing these helper functions, which are given to you:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def make_onehot(indicies, total=250):\n",
" \"\"\"\n",
" Convert indicies into one-hot vectors by\n",
" 1. Creating an identity matrix of shape [total, total]\n",
" 2. Indexing the appropriate columns of that identity matrix\n",
" \"\"\"\n",
" I = np.eye(total)\n",
" return I[indicies]\n",
"\n",
"def softmax(x):\n",
" \"\"\"\n",
" Compute the softmax of vector x, or row-wise for a matrix x.\n",
" We subtract x.max(axis=0) from each row for numerical stability.\n",
" \"\"\"\n",
" x = x.T\n",
" exps = np.exp(x - x.max(axis=0))\n",
" probs = exps / np.sum(exps, axis=0)\n",
" return probs.T\n",
"\n",
"def get_batch(data, range_min, range_max, onehot=True):\n",
" \"\"\"\n",
" Convert one batch of data in the form of 4-grams into input and output\n",
" data and return the training data (xs, ts) where:\n",
" - `xs` is an numpy array of one-hot vectors of shape [batch_size, 3, 250]\n",
" - `ts` is either\n",
" - a numpy array of shape [batch_size, 250] if onehot is True,\n",
" - a numpy array of shape [batch_size] containing indicies otherwise\n",
"\n",
" Preconditions:\n",
" - `data` is a numpy array of shape [N, 4] produced by a call\n",
" to `process_data`\n",
" - range_max > range_min\n",
" \"\"\"\n",
" xs = data[range_min:range_max, :3]\n",
" xs = make_onehot(xs)\n",
" ts = data[range_min:range_max, 3]\n",
" if onehot:\n",
" ts = make_onehot(ts).reshape(-1, 250)\n",
" return xs, ts\n",
"\n",
"def estimate_accuracy(model, data, batch_size=5000, max_N=100000):\n",
" \"\"\"\n",
" Estimate the accuracy of the model on the data. To reduce\n",
" computation time, use at most `max_N` elements of `data` to\n",
" produce the estimate.\n",
" \"\"\"\n",
" correct = 0\n",
" N = 0\n",
" for i in range(0, data.shape[0], batch_size):\n",
" xs, ts = get_batch(data, i, i + batch_size, onehot=False)\n",
" y = model(xs)\n",
" pred = np.argmax(y, axis=1)\n",
" correct += np.sum(ts == pred)\n",
" N += ts.shape[0]\n",
"\n",
" if N > max_N:\n",
" break\n",
" return correct / N"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (a) -- 2 point\n",
"\n",
"Your first task is to implement MLP model in Numpy.\n",
"This model is very similar to the one we built in Tutorial 5. However, we will\n",
"write our code differently from Tutorial 5, so that the class methods and APIs\n",
"are similar to that of PyTorch. This is to give you some intuition about what\n",
"PyTorch is doing under the hood.\n",
"\n",
"We already wrote code for the backward pass for this model in Tutorial 5, so the\n",
"code is given to you. To make sure you understand how the model works, \n",
"**write the code to compute the forward pass**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class NumpyMLPModel(object):\n",
" def __init__(self, num_features=250*3, num_hidden=400, num_classes=250):\n",
" \"\"\"\n",
" Initialize the weights and biases of this two-layer MLP.\n",
" \"\"\"\n",
" self.num_features = num_features\n",
" self.num_hidden = num_hidden\n",
" self.num_classes = num_classes\n",
" self.weights1 = np.zeros([num_hidden, num_features])\n",
" self.bias1 = np.zeros([num_hidden])\n",
" self.weights2 = np.zeros([num_classes, num_hidden])\n",
" self.bias2 = np.zeros([num_classes])\n",
" self.cleanup()\n",
"\n",
" def initializeParams(self):\n",
" \"\"\"\n",
" Initialize the weights and biases of this two-layer MLP to be random.\n",
" This random initialization is necessary to break the symmetry in the\n",
" gradient descent update for our hidden weights and biases. If all our\n",
" weights were initialized to the same value, then their gradients will\n",
" all be the same!\n",
" \"\"\"\n",
" self.weights1 = np.random.normal(0, 2/self.num_features, self.weights1.shape)\n",
" self.bias1 = np.random.normal(0, 2/self.num_features, self.bias1.shape)\n",
" self.weights2 = np.random.normal(0, 2/self.num_hidden, self.weights2.shape)\n",
" self.bias2 = np.random.normal(0, 2/self.num_hidden, self.bias2.shape)\n",
"\n",
" def forward(self, inputs):\n",
" \"\"\"\n",
" Compute the forward pass prediction for inputs.\n",
" Note that `inputs` will be a rank-3 numpy array with shape [N, 3, 250],\n",
" so we will need to flatten the tensor to [N, 750] first.\n",
"\n",
" For the ReLU activation, you may find the function `np.maximum` helpful\n",
" \"\"\"\n",
" X = inputs.reshape([-1, 750])\n",
"\n",
" # TODO:\n",
"\n",
" self.N = X.shape[0]\n",
" self.X = X\n",
" self.z1 = None\n",
" self.h = None\n",
" self.z2 = None\n",
" self.y = None\n",
" return self.y\n",
"\n",
" def __call__(self, inputs):\n",
" \"\"\"\n",
" To be compatible with PyTorch API. With this code, the following two\n",
" calls are identical:\n",
"\n",
" >>> m = TwoLayerMLP()\n",
" >>> m.forward(inputs)\n",
"\n",
" and \n",
"\n",
" >>> m = TwoLayerMLP()\n",
" >>> m(inputs)\n",
" \"\"\"\n",
" return self.forward(inputs)\n",
"\n",
" def backward(self, ts):\n",
" \"\"\"\n",
" Compute the backward pass, given the ground-truth, one-hot targets.\n",
" Note that `ts` needs to be a rank 2 numpy array with shape [N, 250].\n",
" \"\"\"\n",
" self.z2_bar = (self.y - ts) / self.N\n",
" self.w2_bar = np.dot(self.z2_bar.T, self.h)\n",
" self.b2_bar = np.dot(self.z2_bar.T, np.ones(self.N))\n",
" self.h_bar = np.matmul(self.z2_bar, self.weights2)\n",
" self.z1_bar = self.h_bar * (self.z1 > 0)\n",
" self.w1_bar = np.dot(self.z1_bar.T, self.X)\n",
" self.b1_bar = np.dot(self.z1_bar.T, np.ones(self.N))\n",
"\n",
" def update(self, alpha):\n",
" \"\"\"\n",
" Compute the gradient descent update for the parameters.\n",
" \"\"\"\n",
" self.weights1 = self.weights1 - alpha * self.w1_bar\n",
" self.bias1 = self.bias1 - alpha * self.b1_bar\n",
" self.weights2 = self.weights2 - alpha * self.w2_bar\n",
" self.bias2 = self.bias2 - alpha * self.b2_bar\n",
"\n",
" def cleanup(self):\n",
" \"\"\"\n",
" Erase the values of the variables that we use in our computation.\n",
" \"\"\"\n",
" self.N = None\n",
" self.X = None\n",
" self.z1 = None\n",
" self.h = None\n",
" self.z2 = None\n",
" self.y = None\n",
" self.z2_bar = None\n",
" self.w2_bar = None\n",
" self.b2_bar = None\n",
" self.h_bar = None\n",
" self.z1_bar = None\n",
" self.w1_bar = None\n",
" self.b1_bar = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) -- 2 points\n",
"\n",
"Complete the `run_gradient_descent` function. Train your numpy MLP model \n",
"to obtain a training accuracy of at least 25%. You do not need to train\n",
"this model to convergence."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def run_gradient_descent(model,\n",
" train_data=train4grams,\n",
" validation_data=valid4grams,\n",
" batch_size=100,\n",
" learning_rate=0.1,\n",
" max_iters=5000):\n",
" \"\"\"\n",
" Use gradient descent to train the numpy model on the dataset train4grams.\n",
" \"\"\"\n",
" n = 0\n",
" while n < max_iters:\n",
" # shuffle the training data, and break early if we don't have\n",
" # enough data to remaining in the batch\n",
" np.random.shuffle(train_data)\n",
" for i in range(0, train_data.shape[0], batch_size):\n",
" if (i + batch_size) > train_data.shape[0]:\n",
" break\n",
"\n",
" # get the input and targets of a minibatch\n",
" xs, ts = get_batch(train_data, i, i + batch_size, onehot=True)\n",
"\n",
" # forward pass: compute prediction\n",
"\n",
" # TODO: add your code here\n",
"\n",
" # backward pass: compute error \n",
" \n",
" # TODO: add your code here\n",
"\n",
" # increment the iteration count\n",
" n += 1\n",
"\n",
" # compute and plot the *validation* loss and accuracy\n",
" if (n % 100 == 0):\n",
" train_cost = -np.sum(ts * np.log(y)) / batch_size\n",
" train_acc = estimate_accuracy(model, train_data)\n",
" val_acc = estimate_accuracy(model, validation_data)\n",
" model.cleanup()\n",
" print(\"Iter %d. [Val Acc %.0f%%] [Train Acc %.0f%%, Loss %f]\" % (\n",
" n, val_acc * 100, train_acc * 100, train_cost))\n",
"\n",
" if n >= max_iters:\n",
" return\n",
"\n",
"\n",
"numpy_mlp = NumpyMLPModel()\n",
"numpy_mlp.initializeParams()\n",
"# run_gradient_descent(...)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c) -- 2 pts\n",
"\n",
"We will do build the same model in PyTorch. Since PyTorch uses automatic\n",
"differentiation, we only need to write the *forward pass* of our\n",
"model. Complete the `forward` function below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class PyTorchMLP(nn.Module):\n",
" def __init__(self, num_hidden=400):\n",
" super(PyTorchMLP, self).__init__()\n",
" self.layer1 = nn.Linear(750, num_hidden)\n",
" self.layer2 = nn.Linear(num_hidden, 250)\n",
" self.num_hidden = num_hidden\n",
" def forward(self, inp):\n",
" inp = inp.reshape([-1, 750])\n",
" # TODO: complete this function\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (d) -- 4 pts\n",
"\n",
"We'll write similar code to train the PyTorch model. With a few differences:\n",
"\n",
"1. We will use a slightly fancier optimizer called **Adam**. For this optimizer,\n",
" a smaller learning rate usually works better, so the default learning\n",
" rate is set to 0.001.\n",
"2. Since we get weight decay for free, you are welcome to use weight decay.\n",
"\n",
"Complete the function `run_pytorch_gradient_descent`, and use it to train\n",
"your PyTorch MLP model to obtain a training accuracy of at least 38%.\n",
"Plot the learning curve using the `plot_learning_curve` function provided\n",
"to you, and include your plot in your PDF submission."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def estimate_accuracy_torch(model, data, batch_size=5000, max_N=100000):\n",
" \"\"\"\n",
" Estimate the accuracy of the model on the data. To reduce\n",
" computation time, use at most `max_N` elements of `data` to\n",
" produce the estimate.\n",
" \"\"\"\n",
" correct = 0\n",
" N = 0\n",
" for i in range(0, data.shape[0], batch_size):\n",
" # get a batch of data\n",
" xs, ts = get_batch(data, i, i + batch_size, onehot=False)\n",
" \n",
" # forward pass prediction\n",
" y = model(torch.Tensor(xs))\n",
" y = y.detach().numpy() # convert the PyTorch tensor => numpy array\n",
" pred = np.argmax(y, axis=1)\n",
" correct += np.sum(pred == ts)\n",
" N += ts.shape[0]\n",
"\n",
" if N > max_N:\n",
" break\n",
" return correct / N\n",
"\n",
"def run_pytorch_gradient_descent(model,\n",
" train_data=train4grams,\n",
" validation_data=valid4grams,\n",
" batch_size=100,\n",
" learning_rate=0.001,\n",
" weight_decay=0,\n",
" max_iters=1000,\n",
" checkpoint_path=None):\n",
" \"\"\"\n",
" Train the PyTorch model on the dataset `train_data`, reporting\n",
" the validation accuracy on `validation_data`, for `max_iters`\n",
" iteration.\n",
"\n",
" If you want to **checkpoint** your model weights (i.e. save the\n",
" model weights to Google Drive), then the parameter\n",
" `checkpoint_path` should be a string path with `{}` to be replaced\n",
" by the iteration count:\n",
"\n",
" For example, calling \n",
"\n",
" >>> run_pytorch_gradient_descent(model, ...,\n",
" checkpoint_path = '/content/gdrive/My Drive/CSC321/mlp/ckpt-{}.pk')\n",
"\n",
" will save the model parameters in Google Drive every 500 iterations.\n",
" You will have to make sure that the path exists (i.e. you'll need to create\n",
" the folder CSC321, mlp, etc...). Your Google Drive will be populated with files:\n",
"\n",
" - /content/gdrive/My Drive/CSC321/mlp/ckpt-500.pk\n",
" - /content/gdrive/My Drive/CSC321/mlp/ckpt-1000.pk\n",
" - ...\n",
"\n",
" To load the weights at a later time, you can run:\n",
"\n",
" >>> model.load_state_dict(torch.load('/content/gdrive/My Drive/CSC321/mlp/ckpt-500.pk'))\n",
"\n",
" This function returns the training loss, and the training/validation accuracy,\n",
" which we can use to plot the learning curve.\n",
" \"\"\"\n",
" criterion = nn.CrossEntropyLoss()\n",
" optimizer = optim.Adam(model.parameters(),\n",
" lr=learning_rate,\n",
" weight_decay=weight_decay)\n",
"\n",
" iters, losses = [], []\n",
" iters_sub, train_accs, val_accs = [], [] ,[]\n",
"\n",
" n = 0 # the number of iterations\n",
" while True:\n",
" for i in range(0, train_data.shape[0], batch_size):\n",
" if (i + batch_size) > train_data.shape[0]:\n",
" break\n",
"\n",
" # get the input and targets of a minibatch\n",
" xs, ts = get_batch(train_data, i, i + batch_size, onehot=False)\n",
"\n",
" # convert from numpy arrays to PyTorch tensors\n",
" xs = torch.Tensor(xs)\n",
" ts = torch.Tensor(ts).long()\n",
"\n",
" # zs = ... # compute prediction logit\n",
" # loss = # compute the total loss\n",
" # ... # compute updates for each parameter\n",
" # ... # make the updates for each parameter\n",
" # ... # a clean up step for PyTorch\n",
"\n",
" # save the current training information\n",
" iters.append(n)\n",
" losses.append(float(loss)/batch_size) # compute *average* loss\n",
"\n",
" if n % 500 == 0:\n",
" iters_sub.append(n)\n",
" train_cost = float(loss.detach().numpy())\n",
" train_acc = estimate_accuracy_torch(model, train_data)\n",
" train_accs.append(train_acc)\n",
" val_acc = estimate_accuracy_torch(model, validation_data)\n",
" val_accs.append(val_acc)\n",
" print(\"Iter %d. [Val Acc %.0f%%] [Train Acc %.0f%%, Loss %f]\" % (\n",
" n, val_acc * 100, train_acc * 100, train_cost))\n",
"\n",
" if (checkpoint_path is not None) and n > 0:\n",
" torch.save(model.state_dict(), checkpoint_path.format(n))\n",
"\n",
" # increment the iteration number\n",
" n += 1\n",
"\n",
" if n > max_iters:\n",
" return iters, losses, iters_sub, train_accs, val_accs\n",
"\n",
"\n",
"def plot_learning_curve(iters, losses, iters_sub, train_accs, val_accs):\n",
" \"\"\"\n",
" Plot the learning curve.\n",
" \"\"\"\n",
" plt.title(\"Learning Curve: Loss per Iteration\")\n",
" plt.plot(iters, losses, label=\"Train\")\n",
" plt.xlabel(\"Iterations\")\n",
" plt.ylabel(\"Loss\")\n",
" plt.show()\n",
"\n",
" plt.title(\"Learning Curve: Accuracy per Iteration\")\n",
" plt.plot(iters_sub, train_accs, label=\"Train\")\n",
" plt.plot(iters_sub, val_accs, label=\"Validation\")\n",
" plt.xlabel(\"Iterations\")\n",
" plt.ylabel(\"Accuracy\")\n",
" plt.legend(loc='best')\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pytorch_mlp = PyTorchMLP()\n",
"# learning_curve_info = run_pytorch_gradient_descent(pytorch_mlp, ...)\n",
"\n",
"# you might want to save the `learning_curve_info` somewhere, so that you can plot\n",
"# the learning curve prior to exporting your PDF file\n",
"\n",
"# plot_learning_curve(*learning_curve_info)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (e) -- 3 points\n",
"\n",
"Write a function `make_prediction` that takes as parameters\n",
"a PyTorchMLP model and sentence (a list of words), and produces\n",
"a prediction for the next word in the sentence.\n",
"\n",
"Start by thinking about what you need to do, step by step, taking\n",
"care of the difference between a numpy array and a PyTorch Tensor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def make_prediction_torch(model, sentence):\n",
" \"\"\"\n",
" Use the model to make a prediction for the next word in the\n",
" sentence using the last 3 words (sentence[:-3]). You may assume\n",
" that len(sentence) >= 3 and that `model` is an instance of\n",
" PYTorchMLP.\n",
"\n",
" This function should return the next word, represented as a string.\n",
"\n",
" Example call:\n",
" >>> make_prediction_torch(pytorch_mlp, ['you', 'are', 'a'])\n",
" \"\"\"\n",
" global vocab_stoi, vocab_itos\n",
"\n",
" # Write your code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (f) -- 4 points\n",
"\n",
"Use your code to predict what the next word should be in each\n",
"of the following sentences:\n",
"\n",
"- \"You are a\"\n",
"- \"few companies show\"\n",
"- \"There are no\"\n",
"- \"yesterday i was\"\n",
"- \"the game had\"\n",
"- \"yesterday the federal\"\n",
"\n",
"Do your predictions make sense? (If all of your predictions are the same,\n",
"train your model for more iterations, or change the hyperparameters in your\n",
"model. You may need to do this even if your training accuracy is >=38%)\n",
"\n",
"One concern you might have is that our model may be \"memorizing\" information\n",
"from the training set. Check if each of 3-grams (the 3 words appearing next\n",
"to each other) appear in the training set. If so, what word occurs immediately\n",
"following those three words?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Write your code and answers here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (g) -- 1 points\n",
"\n",
"Report the test accuracy of your model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Write your code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 3. Learning Word Embeddings\n",
"\n",
"In this section, we will build a slightly different model with a different\n",
"architecture. In particular, we will first compute a lower-dimensional\n",
"*representation* of the three words, before using a multi-layer perceptron.\n",
"\n",
"Our model will look like this:\n",
"\n",
"\n",
"\n",
"This model has 3 layers instead of 2, but the first layer of the network\n",
"is **not** fully-connected. Instead, we compute the representations of each\n",
"of the three words **separately**. In addition, the first layer of the network\n",
"will not use any biases. The reason for this will be clear in question 4.\n",
"\n",
"### Part (a) - 10 pts\n",
"\n",
"Complete the methods in `NumpyWordEmbModel`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class NumpyWordEmbModel(object):\n",
" def __init__(self, vocab_size=250, emb_size=100, num_hidden=100):\n",
" self.vocab_size = vocab_size\n",
" self.emb_size = emb_size\n",
" self.num_hidden = num_hidden\n",
" self.emb_weights = np.zeros([emb_size, vocab_size]) # no biases in this layer\n",
" self.weights1 = np.zeros([num_hidden, emb_size * 3])\n",
" self.bias1 = np.zeros([num_hidden])\n",
" self.weights2 = np.zeros([vocab_size, num_hidden])\n",
" self.bias2 = np.zeros([vocab_size])\n",
" self.cleanup()\n",
"\n",
" def initializeParams(self):\n",
" \"\"\"\n",
" Randomly initialize the weights and biases of this two-layer MLP.\n",
" The randomization is necessary so that each weight is updated to\n",
" a different value.\n",
" \"\"\"\n",
" self.emb_weights = np.random.normal(0, 2/self.num_hidden, self.emb_weights.shape)\n",
" self.weights1 = np.random.normal(0, 2/self.num_features, self.weights1.shape)\n",
" self.bias1 = np.random.normal(0, 2/self.num_features, self.bias1.shape)\n",
" self.weights2 = np.random.normal(0, 2/self.num_hidden, self.weights2.shape)\n",
" self.bias2 = np.random.normal(0, 2/self.num_hidden, self.bias2.shape)\n",
"\n",
" def forward(self, inputs):\n",
" \"\"\"\n",
" Compute the forward pass prediction for inputs.\n",
" Note that `inputs` will be a rank-3 numpy array with shape [N, 3, 250].\n",
"\n",
" For numerical stability reasons, we **do not** apply the softmax\n",
" activation in the forward function. The loss function assumes that \n",
" we return the logits from this function.\n",
" \"\"\"\n",
" # TODO\n",
"\n",
" def __call__(self, inputs):\n",
" return self.forward(inputs)\n",
"\n",
" def backward(self, ts):\n",
" \"\"\"\n",
" Compute the backward pass, given the ground-truth, one-hot targets.\n",
" Note that `ts` needs to be a rank 2 numpy array with shape [N, 250].\n",
"\n",
" Remember the multivariate chain rule: if a weight affects the loss\n",
" through different paths, then the error signal from all the paths\n",
" must be added together.\n",
" \"\"\"\n",
" # TODO\n",
"\n",
" def update(self, alpha):\n",
" \"\"\"\n",
" Compute the gradient descent update for the parameters.\n",
" \"\"\"\n",
" # TODO\n",
"\n",
" def cleanup(self):\n",
" \"\"\"\n",
" Erase the values of the variables that we use in our computation.\n",
" \"\"\"\n",
" # TODO"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) -- 1 pts\n",
"\n",
"One strategy that machine learning practitioners use to debug their code\n",
"is to *first try to overfit their model to a small training set*. If the\n",
"gradient computation is correct and the data is encoded properly, then your\n",
"model should easily achieve 100% training accuracy on a small training set.\n",
"\n",
"Show that your model is implemented correctly by showing that your model\n",
"can achieve an 100% training accuracy within a few hundred iterations, when\n",
"using a small training set (e.g. one batch)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# numpy_wordemb = NumpyWordEmbModel()\n",
"# run_pytorch_gradient_descent(numpy_wordemb, train4grams[:64], batch_size=64, ...)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c) -- 2 pts\n",
"\n",
"Train your model from part (a) to obtain a training accuracy of at least 25%."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Your code goes here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (d) -- 2 pts\n",
"\n",
"The PyTorch version of the model is implemented for you. Use \n",
"`run_pytorch_gradient_descent` to train\n",
"your PyTorch MLP model to obtain a training accuracy of at least 38%.\n",
"Plot the learning curve using the `plot_learning_curve` function provided\n",
"to you, and include your plot in your PDF submission.\n",
"\n",
"Make sure that you checkpoint frequently. We will be using ..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class PyTorchWordEmb(nn.Module):\n",
" def __init__(self, emb_size=100, num_hidden=300, vocab_size=250):\n",
" super(PyTorchWordEmb, self).__init__()\n",
" self.word_emb_layer = nn.Linear(vocab_size, emb_size, bias=False)\n",
" self.fc_layer1 = nn.Linear(emb_size * 3, num_hidden)\n",
" self.fc_layer2 = nn.Linear(num_hidden, 250)\n",
" self.num_hidden = num_hidden\n",
" self.emb_size = emb_size\n",
" def forward(self, inp):\n",
" embeddings = torch.relu(self.word_emb_layer(inp))\n",
" embeddings = embeddings.reshape([-1, self.emb_size * 3])\n",
" hidden = torch.relu(self.fc_layer1(embeddings))\n",
" return self.fc_layer2(hidden)\n",
"\n",
"# pytorch_wordemb= PyTorchWordEmb()\n",
"\n",
"# result = run_pytorch_gradient_descent(pytorch_wordemb,\n",
"# max_iters=20000,\n",
"# ...)\n",
"\n",
"# plot_learning_curve(*result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (e) -- 2 pts\n",
"\n",
"Use the function `make_prediction` that you wrote earlier to \n",
"predict what the next word should be in each of the following sentences:\n",
"\n",
"- \"You are a\"\n",
"- \"few companies show\"\n",
"- \"There are no\"\n",
"- \"yesterday i was\"\n",
"- \"the game had\"\n",
"- \"yesterday the federal\"\n",
"\n",
"How do these predictions compared to the previous model?\n",
"\n",
"Just like before, if all of your predictions are the same,\n",
"train your model for more iterations, or change the hyperparameters in your\n",
"model. You may need to do this even if your training accuracy is >=38%."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Your code goes here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (f) -- 1 pts\n",
"\n",
"Report the test accuracy of your model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Write your code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 4. Visualizing Word Embeddings\n",
"\n",
"While training the `PyTorchMLP`, we trained the `word_emb_layer`, which takes a one-hot\n",
"representation of a word in our vocabulary, and returns a low-dimensional vector\n",
"representation of that word. In this question, we will explore these word embeddings.\n",
"\n",
"### Part (a) -- 2 pts\n",
"\n",
"The code below extracts the **weights** of the word embedding layer,\n",
"and converts the PyTorch tensor into an numpy array.\n",
"Explain why each *row* of `word_emb` contains the vector representing\n",
"of a word. For example `word_emb[vocab_stoi[\"any\"],:]` contains the\n",
"vector representation of the word \"any\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"word_emb_weights = list(pytorch_wordemb.word_emb_layer.parameters())[0]\n",
"word_emb = word_emb_weights.detach().numpy().T\n",
"\n",
"# Write your explanation here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) -- 2 pts\n",
"\n",
"Once interesting thing about these word embeddings is that distances\n",
"in these vector representations of words make some sense! To show this,\n",
"we have provided code below that computes the cosine similarity of\n",
"every pair of words in our vocabulary. This code should look familiar,\n",
"since we have seen it in project 1."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"norms = np.linalg.norm(word_emb, axis=1)\n",
"word_emb_norm = (word_emb.T / norms).T\n",
"similarities = np.matmul(word_emb_norm, word_emb_norm.T)\n",
"\n",
"# Some example distances. The first one should be larger than the second\n",
"print(similarities[vocab_stoi['any'], vocab_stoi['many']])\n",
"print(similarities[vocab_stoi['any'], vocab_stoi['government']])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compute the 5 closest words to the following words:\n",
"\n",
"- \"four\"\n",
"- \"go\"\n",
"- \"what\"\n",
"- \"should\"\n",
"- \"school\"\n",
"- \"your\"\n",
"- \"yesterday\"\n",
"- \"not\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Write your code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c) -- 2 pts\n",
"\n",
"We can visualize the word embeddings by reducing the dimensionality of\n",
"the word vectors to 2D. There are many dimensionality reduction techniques\n",
"that we could use, and we will use an algorithm called t-SNE.\n",
"(You donâ€™t need to know what this is for the assignment,\n",
"but we may cover it later in the course.)\n",
"Nearby points in this 2-D space are meant to correspond to nearby points\n",
"in the original, high-dimensional space.\n",
"\n",
"The following code runs the t-SNE algorithm and plots the result.\n",
"Look at the plot and find two clusters of related words.\n",
"What do the words in each cluster have in common?\n",
"\n",
"Note that there is randomness in the initialization of the t-SNE \n",
"algorithm. If you re-run this code, you may get a different image.\n",
"Please make sure to submit your image in the PDF file for your TA to see."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sklearn.manifold\n",
"tsne = sklearn.manifold.TSNE()\n",
"Y = tsne.fit_transform(word_emb)\n",
"\n",
"plt.figure(figsize=(10, 10))\n",
"plt.xlim(Y[:,0].min(), Y[:, 0].max())\n",
"plt.ylim(Y[:,1].min(), Y[:, 1].max())\n",
"for i, w in enumerate(vocab):\n",
" plt.text(Y[i, 0], Y[i, 1], w)\n",
"plt.show()"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}