Why is an autoregressive model called "memoryless"? After all, they do base their predictions on the past. Answer notes: there's a fixed distance in time beyond which nothing is remembered. See below (RNN diagram). How could such a system be used for an RNN that produces a sentence of text, one letter at a time? What would be input; what would be output; what would be the training signal; which units at which time slices would represent the input & output? Answer notes: the system tries to guess the next letter, given the preceding ones. Could an RNN nicely serve as the brain of a robot? Why or why not, i.e. what's good about this idea and what's bad about it? What would the details of the set-up be? Answer notes: A robot needs reinforcement learning, and RNNs are made for supervised learning. Let's say we have an RNN with N units (e.g. 100). All units connect to all other units (and even to themselves). We have data that consists of T time steps (e.g. 35). 1. What is the asymptotic runtime of computing the gradient for one training case, using the "backpropagation through time" algorithm? Answer: TNN. 2. What is the asymptotic memory requirement for computing the gradient for one training case, using the "backpropagation through time" algorithm? Answer: TNN (naive) or NN+TN (optimized).