{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Recurrent Neural Networks\n",
"\n",
"Last time, before the midterm, we discussed using recurrent neural networks\n",
"to make predictions about sequences. In particular, we treated tweets\n",
"as a **sequence** of words. Since tweets can have a variable number of words,\n",
"we needed an architecture that can take as input variable-sized inputs.\n",
"\n",
"The recurrent neural network architecture looked something like this:\n",
"\n",
"\n",
"\n",
"We briefly discussed how recurrent neural networks can be used to **generate**\n",
"sequences. Generating sequences is more involved compared to making predictions\n",
"about sequences. However, many students chose text generations problems for their\n",
"project, so a brief discussion on generating text might be worthwhile.\n",
"\n",
"Much of today's content is an adaptation of the \"Practical PyTorch\" github \n",
"repository [1].\n",
"\n",
"[1] https://github.com/spro/practical-pytorch/blob/master/char-rnn-generation/char-rnn-generation.ipynb\n",
"\n",
"## Preparing the Data Set\n",
"\n",
"We will begin by choosing some text to generate. Since we are already working\n",
"\"SMS Spam Collection Data Set\" [2], we will build a model to generate spam\n",
"SMS text messages.\n",
"\n",
"[2] http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"import torch.optim as optim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are going to start by doing something a little strange: we are going\n",
"to concatenate all spam messages into a **single** string. We will sample\n",
"random subsequences (chunk) from the combined string containing all spam messages.\n",
"\n",
"This technique makes less sense when we use short strings like SMS text messages,\n",
"but makes more sense we are working with sequences that are much longer than\n",
"the random subsequence samples (chunk) -- for example if we trained on news articles,\n",
"Wikipedia pages, Shakespeare plays, or TV scripts. In all cases, the probability\n",
"of choosing a chunk that contains text from two samples will be small.\n",
"\n",
"In our case, we could do better than combining all training text into one string.\n",
"For simplicity, however, we won't."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"spam_text = \"\"\n",
"for line in open('SMSSpamCollection'):\n",
" if line.startswith(\"spam\"):\n",
" spam_text += line.split(\"\\t\")[1].strip(\"\\n\")\n",
"\n",
"# show the first 100 characters\n",
"spam_text[:100]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since we are working with SMS text messages, we will use a character-level RNN.\n",
"The reason is that spammy SMS messages will contain not only words, but\n",
"abbreviations,\n",
"numbers and other non-word characters.\n",
"\n",
"We find all the possible characters in `spam_text`, and build dictionary mappings\n",
"from the character to the index of that character (a unique integer identifier),\n",
"and from the index to the character. We'll use the same naming scheme that `torchtext`\n",
"uses (`stoi` and `itos`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vocab = list(set(spam_text))\n",
"vocab_stoi = {s: i for i, s in enumerate(vocab)}\n",
"vocab_itos = {i: s for i, s in enumerate(vocab)}\n",
"len(vocab)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are 94 unique characters in our training data set.\n",
"\n",
"Now, we'll write a function to select a random chunk. Each time we need a new\n",
"training example, we will call `random_chunk()` to obtain a random subsequence \n",
"of `spam_text`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"random.seed(7)\n",
"\n",
"spam_len = len(spam_text)\n",
"\n",
"def random_chunk(chunk_len=50):\n",
" \"\"\"Return a random subsequence from `spam_text`\"\"\"\n",
" start_index = random.randint(0, spam_len - chunk_len)\n",
" end_index = start_index + chunk_len + 1\n",
" return spam_text[start_index:end_index]\n",
"\n",
"print(random_chunk())\n",
"print(random_chunk())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since we will use one-hot embedding to represent each character, we need to\n",
"look up the *indices* of each character in a chunk. We will also combine\n",
"the indicies of each character into a tensor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def text_to_tensor(text, vocab=vocab):\n",
" \"\"\"Return a tensor containing the indices of characters in `text`.\"\"\"\n",
" indices = [vocab_stoi[ch] for ch in text]\n",
" return torch.tensor(indices)\n",
"\n",
"print(text_to_tensor(random_chunk()))\n",
"print(text_to_tensor(random_chunk()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use these tensors to train our RNN model. But how?\n",
"\n",
"At a very high level, we want our RNN model to have a high probability\n",
"of generating the text in our training set. An RNN model generates text\n",
"one character at a time based on the hidden state value.\n",
"We can check, at each time step, whether the model generates\n",
"the correct next character. That is, at each time step,\n",
"we are trying to select the correct next character out of all the \n",
"characters in our vocabulary. Recall that this problem is a multi-class\n",
"classification problem.\n",
"\n",
"However, unlike multi-class classification problems with fixed-sized inputs,\n",
"we need to keep track of the hidden state. In particular, we need to update\n",
"the hidden state with the actual, ground-truth characters at each time step.\n",
"\n",
"So, if we are training on the string `RIGHT`, we will do something like this:\n",
"\n",
"\n",
"\n",
"We will start with some sequence to produce an initial hidden state\n",
"(first green box from the left), and the RNN model will make a prediction\n",
"on what letter should appear next.\n",
"\n",
"Then, we will feed the correct\n",
"letter \"R\" as the next token in the sequence, to produce a new hidden\n",
"state (second green box from the left). We use this new hidden state\n",
"to predict what letter should appear next. \n",
"\n",
"Again, we will feed the correct\n",
"letter \"I\" as the next token in the sequence, to produce a new hidden\n",
"state (third green box from the left). We continue until we exhaust the\n",
"entire sequence.\n",
"\n",
"In this example, we are (somewhat simultaneously) solving many different\n",
"multi-class classification problems. We know the ground-truth answer for\n",
"those all problems, meaning that we can use a cross-entropy loss and\n",
"the usual optimizers to train our recurrent neural network weights.\n",
"\n",
"To set our data up for training, we will separate the input sequence\n",
"(bottom row in the above diagram) and the target output sequence\n",
"(top row in the above diagram). The two sequences are really just offset by one."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def random_training_set(chunk_len=50): \n",
" chunk = random_chunk(chunk_len)\n",
" inp = text_to_tensor(chunk[:-1]) # omit the last token\n",
" target = text_to_tensor(chunk[1:]) # omit the first token\n",
" return inp, target\n",
"\n",
"random_training_set(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The RNN Model\n",
"\n",
"We are ready to build the recurrent neural network model. The model\n",
"has two main trainable components, an RNN model (in this case, `nn.LSTM`)\n",
"and a \"decoder\" model that decodes RNN outputs into a distribution\n",
"over the possible characters in our vocabulary."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class SpamGenerator(nn.Module):\n",
" def __init__(self, vocab_size, hidden_size, n_layers=1):\n",
" super(SpamGenerator, self).__init__()\n",
" # RNN attributes\n",
" self.vocab_size = vocab_size\n",
" self.hidden_size = hidden_size\n",
" self.n_layers = n_layers\n",
" # identiy matrix for generating one-hot vectors\n",
" self.ident = torch.eye(vocab_size)\n",
" # recurrent neural network\n",
" self.rnn = nn.RNN(vocab_size, hidden_size, n_layers, batch_first=True)\n",
" # a fully-connect layer that decodes the RNN output to\n",
" # a distribution over the vocabulary\n",
" self.decoder = nn.Linear(hidden_size, vocab_size)\n",
" \n",
" def forward(self, inp, hidden):\n",
" # reshape the input tensor to [1, seq_length]\n",
" inp = inp.view(1, -1)\n",
" # generate one-hot vectors from token indices\n",
" inp = self.ident[inp]\n",
" # obtain the next output and hidden state\n",
" output, hidden = self.rnn(inp, hidden)\n",
" # run the decoder\n",
" output = self.decoder(output.squeeze(0))\n",
" return output, hidden\n",
"\n",
" def init_hidden(self):\n",
" return torch.zeros(self.n_layers, 1, self.hidden_size)\n",
" \n",
"model = SpamGenerator(len(vocab), 128)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training the RNN Model\n",
"\n",
"Before actually training our model, let's go back\n",
"to the figure from earlier, and write code to train\n",
"our model for *one* iteration.\n",
"\n",
"\n",
"\n",
"First of all, we can generate some training data. We'll use a small chunk\n",
"size for now."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chunk_len = 20\n",
"inp, target = random_training_set(chunk_len)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Second of all, we need a loss function and optimizer. Since we are performing\n",
"multi-class classification for each character we wish to produce, we will use\n",
"the cross entropy loss."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"criterion = nn.CrossEntropyLoss()\n",
"optimizer = torch.optim.Adam(model.parameters(), lr=0.005)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we will perform the first classification problem (the second column\n",
"in the figure). We start with a new hidden state (of all zeros):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hidden = model.init_hidden()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we will feed the next token to the RNN, producing an `output` vector\n",
"and a new hidden state."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"output, hidden = model(inp[0], hidden)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can compute the loss using `criterion`. Since the model is untrained,\n",
"the loss is expected to be high. (For now, we won't do anything\n",
"with this loss, and omit the backward pass.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"criterion(output, target[0].unsqueeze(0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With our new hidden state, we can solve the problem of predicting the *next*\n",
"token;"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"output, hidden = model(inp[1], hidden) # predict distribution of next token\n",
"criterion(output, target[1].unsqueeze(0)) # compute the loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can write a loop to do the entire computation.\n",
"Alternatively, we can simply call:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hidden = model.init_hidden()\n",
"output, hidden = model(inp, hidden) # predict distribution of next token\n",
"criterion(output, target) # compute the loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generating Text\n",
"\n",
"Before we actually train our RNN model, we should talk about how\n",
"we will actually use the RNN model to generate text. If we can \n",
"generate text, we can make a qualitative asssessment of how well\n",
"our RNN is performing.\n",
"\n",
"The main difference between training and test-time (generation time)\n",
"is that we don't have the ground-truth tokens to feed as inputs\n",
"to the RNN. Instead, we will take the **output token** generated\n",
"in the previous timestep as input.\n",
"\n",
"We will also \"prime\" our RNN hidden state. That is, instead of\n",
"starting with a hidden state vector of all zeros, we will feed\n",
"a small number of tokens into the RNN first.\n",
"\n",
"Lastly, at each time step, instead of always selecting the\n",
"token with the largest probability, we will add some randomness.\n",
"That is, we will use the logit outputs from our model to\n",
"construct a multinomial distribution over the tokens,\n",
"and sample a random token from that multinomial distribution.\n",
"\n",
"One natural multinomial distribution we can choose is the \n",
"distribution we get after applying the softmax on the outputs.\n",
"However, we will do one more thing: we will add a **temperature**\n",
"parameter to manipulate the softmax outputs. We can set a\n",
"**higher temperature** to make the probability of each token\n",
"**more even** (more random), or a **lower temperature** to assighn\n",
"more probability to the tokens with a higher logit (output).\n",
"A **higher temperature** means that we will get a more diverse sample,\n",
"with potentially more mistakes. A **lower temperature** means that we\n",
"may see repetitions of the same high probability sequence."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def evaluate(model, prime_str='win', predict_len=100, temperature=0.8):\n",
" hidden = model.init_hidden()\n",
" prime_input = text_to_tensor(prime_str)\n",
" predicted = prime_str\n",
" \n",
" # Use priming string to \"build up\" hidden state\n",
" for p in range(len(prime_str) - 1):\n",
" _, hidden = model(prime_input[p], hidden)\n",
" inp = prime_input[-1]\n",
" \n",
" for p in range(predict_len):\n",
" output, hidden = model(inp, hidden)\n",
" \n",
" # Sample from the network as a multinomial distribution\n",
" output_dist = output.data.view(-1).div(temperature).exp()\n",
" top_i = int(torch.multinomial(output_dist, 1)[0])\n",
" # Add predicted character to string and use as next input\n",
" predicted_char = vocab_itos[top_i]\n",
" predicted += predicted_char\n",
" inp = text_to_tensor(predicted_char)\n",
"\n",
" return predicted\n",
"\n",
"print(evaluate(model, predict_len=20))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is hard to see the effect of the `temperature` parameter with\n",
"an untrained model, so we will come back to this idea after training\n",
"our model.\n",
"\n",
"\n",
"## Training\n",
"\n",
"We can put everything we have done together to train the model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def train(model, num_iters=2000, lr=0.004):\n",
" optimizer = torch.optim.Adam(model.parameters(), lr=lr)\n",
" criterion = nn.CrossEntropyLoss()\n",
" for it in range(num_iters):\n",
" # get training set\n",
" inp, target = random_training_set()\n",
" # cleanup\n",
" optimizer.zero_grad()\n",
" # forward pass\n",
" hidden = model.init_hidden()\n",
" output, _ = model(inp, hidden)\n",
" loss = criterion(output, target)\n",
" # backward pass\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" if it % 200 == 199:\n",
" print(\"[Iter %d] Loss %f\" % (it+1, float(loss)))\n",
" print(\" \" + evaluate(model, ' ', 50))\n",
"\n",
"train(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Gated Recurrent Units\n",
"\n",
"Last time, we discussed the Long Short-Term Memory (LSTM) model\n",
"`nn.LSTM` as an alternative to `nn.RNN`. We did not use `nn.LSTM`\n",
"since the `nn.LSTM` model requires both a hidden and a cell-state.\n",
"We can switch our model to use `nn.LSTM` if we want to, and\n",
"obtain a better performance.\n",
"\n",
"Instead, there is another RNN model we could use called the\n",
"\"Gated Recurrent Unit\" `nn.GRU`. This is a newer model than the LSTM,\n",
"and a smaller model that uses some of the key ideas of the LSTM.\n",
"Like the LSTM,\n",
"GRU units are also capable of learning long-term dependencies.\n",
"GRU units perform about as well as the LSTM, but does not have the\n",
"cell state. \n",
"\n",
"In our code, we can swap in the `nn.GRU` unit in place of the `nn.RNN`\n",
"unit. Let's make the swap. We should see a performance boost."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class SpamGenerator(nn.Module):\n",
" def __init__(self, vocab_size, hidden_size, n_layers=1):\n",
" super(SpamGenerator, self).__init__()\n",
" # RNN attributes\n",
" self.vocab_size = vocab_size\n",
" self.hidden_size = hidden_size\n",
" self.n_layers = n_layers\n",
" # identiy matrix for generating one-hot vectors\n",
" self.ident = torch.eye(vocab_size)\n",
" # recurrent neural network\n",
" self.rnn = nn.GRU(vocab_size, hidden_size, n_layers, batch_first=True)\n",
" # a fully-connect layer that decodes the RNN output to\n",
" # a distribution over the vocabulary\n",
" self.decoder = nn.Linear(hidden_size, vocab_size)\n",
" \n",
" def forward(self, inp, hidden):\n",
" # reshape the input tensor to [1, seq_length]\n",
" inp = inp.view(1, -1)\n",
" # generate one-hot vectors from token indices\n",
" inp = self.ident[inp]\n",
" # obtain the next output and hidden state\n",
" output, hidden = self.rnn(inp, hidden)\n",
" # run the decoder\n",
" output = self.decoder(output.squeeze(0))\n",
" return output, hidden\n",
"\n",
" def init_hidden(self):\n",
" return torch.zeros(self.n_layers, 1, self.hidden_size)\n",
" \n",
"model = SpamGenerator(len(vocab), 128)\n",
"train(model, num_iters=5000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Temperature\n",
"\n",
"Now let's look at the effect of temperature. We'll start with a very \n",
"low temperature:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in range(10):\n",
" print(evaluate(model, ' ', 50, temperature=0.2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice how we get fairly good samples, but they are all\n",
"very similar to each other.\n",
"\n",
"If we increase the temperature, we get more diverse sequences.\n",
"However, the quality of the samples are not as good:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in range(10):\n",
" print(evaluate(model, 'win', 50, temperature=0.8))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, if we increase the temperature too much, we get\n",
"very diverse samples, but the quality becomes increasingly poor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in range(10):\n",
" print(evaluate(model, 'win', 50, temperature=1.2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we increase the temperature enough, we might as well generate\n",
"random sequences."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in range(10):\n",
" print(evaluate(model, 'win', 50, temperature=3))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}