# coding: utf-8
# # CSC338. Homework 6
#
# Due Date: Wednesday March 4, 9pm
#
# Please see the guidelines at https://www.cs.toronto.edu/~lczhang/338/homework.html
#
# ### What to Hand In
#
# Please hand in 2 files:
#
# - Python File containing all your code, named `hw6.py`.
# - PDF file named `hw6_written.pdf` containing your solutions to the written parts of the assignment. Your solution can be hand-written, but must be legible. Graders may deduct marks for illegible or poorly presented solutions.
#
# If you are using Jupyter Notebook to complete the work, your notebook can be exported as a .py file (File -> Download As -> Python). Your code will be auto-graded using Python 3.6, so please make sure that your code runs. There will be a 20% penalty if you need a remark due to small issues that renders your code untestable.
#
# **Make sure to remove or comment out all matplotlib or other expensive code before submitting your homework!**
#
# Submit the assignment on **MarkUs** by 9pm on the due date. See the syllabus for the course policy regarding late assignments. All assignments must be done individually.
# In[ ]:
import math
import numpy as np
# ## Question 1
#
# ### Part (a) -- 3 pt
#
# Write a function `solve_normal_equation` that takes an $m \times n$ matrix $A$ (with $m > n$ and
# $rank(A) = n$) and a vector $b$,
# and solves the linear least squares problem $A x \approx b$ using the Normal Equation method.
#
# Use the `cholesky_factorize` method and other methods that you wrote in previous
# homeworks, but do not use any functions from the package `np.linalg`
# In[ ]:
def solve_normal_equation(A, b):
"""
Return the solution x to the linear least squares problem
$$Ax \approx b$$ using normal equations,
where A is an (m x n) matrix, with m > n, rank(A) = n, and
b is a vector of size (m)
"""
# ### Part (b) -- 1 pt
#
# Consider the linear least square system below, where $A$ is a $50 \times 3$ matrix,
# and $b$ is a vector of size $50$. The system is derived from the "iris" dataset.
#
# Use the `solve_normal_equation` function you wrote in part (a) to find the vector `x`
# that minimizes the norm of `matmul(A, x) - b`.
#
# Store the solution in the variable `iris_x` and the 2-norm of the residual
# in `iris_residual`. You may use the function `np.linalg.norm`.
# In[ ]:
irisA = np.array([[4.8, 6.3, 5.7, 5.1, 7.7, 5.6, 4.9, 4.4, 6.4, 6.2, 6.7, 4.5, 6.3,
4.8, 5.8, 4.7, 5.4, 7.4, 6.4, 6.3, 5.1, 5.7, 6.5, 5.5, 7.2, 6.9,
6.8, 6. , 5. , 5.4, 5.6, 6.1, 5.9, 5.6, 6. , 6. , 4.4, 6.9, 6.7,
5.1, 6. , 6.3, 4.6, 6.7, 5. , 6.7, 5.8, 5.1, 5.2, 6.1],
[3.1, 2.5, 3. , 3.7, 3.8, 3. , 3.1, 2.9, 2.9, 2.9, 3.1, 2.3, 2.5,
3.4, 2.7, 3.2, 3.7, 2.8, 2.8, 3.3, 2.5, 4.4, 3. , 2.6, 3.2, 3.2,
3. , 2.9, 3.6, 3.9, 3. , 2.6, 3.2, 2.8, 2.7, 2.2, 3.2, 3.1, 3.1,
3.8, 2.2, 2.7, 3.1, 3. , 3.5, 2.5, 4. , 3.5, 3.4, 3. ],
[1.6, 4.9, 4.2, 1.5, 6.7, 4.5, 1.5, 1.4, 4.3, 4.3, 5.6, 1.3, 5. ,
1.6, 5.1, 1.3, 1.5, 6.1, 5.6, 6. , 3. , 1.5, 5.8, 4.4, 6. , 5.7,
5.5, 4.5, 1.4, 1.3, 4.1, 5.6, 4.8, 4.9, 5.1, 5. , 1.3, 4.9, 4.4,
1.5, 4. , 4.9, 1.5, 5.2, 1.3, 5.8, 1.2, 1.4, 1.4, 4.9]])
irisb = np.array([0.2, 1.5, 1.2, 0.4, 2.2, 1.5, 0.1, 0.2, 1.3, 1.3, 2.4, 0.3, 1.9,
0.2, 1.9, 0.2, 0.2, 1.9, 2.1, 2.5, 1.1, 0.4, 2.2, 1.2, 1.8, 2.3,
2.1, 1.5, 0.2, 0.4, 1.3, 1.4, 1.8, 2. , 1.6, 1.5, 0.2, 1.5, 1.4,
0.3, 1. , 1.8, 0.2, 2.3, 0.3, 1.8, 0.2, 0.2, 0.2, 1.8])
A = irisA.T # transpose
b = irisb
iris_x = None
iris_residual = None
# ## Question 2
#
# ### Part (a) -- 3 pt
#
# Write a function `householder_v` that returns the vector $v$ that defines
# the Householder transform
#
# $$H = I - 2 \frac{v v^T} {v^T v}$$
#
# that eliminates all but the first element of a vector $a$.
# You may use the function `np.linalg.norm`.
# In[ ]:
def householder_v(a):
"""Return the vector $v$ that defines the Householder Transform
H = I - 2 np.matmul(v, v.T) / np.matmul(v.T, v)
that will eliminate all but the first element of the
input vector a. Choose the $v$ that does not result in
cancellation.
Do not modify the vector `a`.
Example:
>>> a = np.array([2., 1., 2.])
>>> householder_v(a)
array([5., 1., 2.])
>>> a
array([2., 1., 2.])
"""
# ### Part (b) -- 2 pt
#
# Show that a Householder Transformation $H$ is orthogonal.
# Include your solution in your PDF writeup.
#
# ### Part (c) -- 2 pt
#
# Write a function `apply_householder` that applies the Householder
# transform defined by a vector $v$ to a vector $u$. You should **not**
# compute the Householder transform matrix $H$. You should only
# need to compute vector-vector dot products and vector-scalar multiplications.
# In[ ]:
def apply_householder(v, u):
"""Return the result of the Householder transformation defined
by the vector $v$ applied to the vector $u$. You should not
compute the Householder matrix H directly.
Example:
>>> apply_householder(np.array([5., 1., 2.]), np.array([2., 1., 2.]))
array([-3., 0., 0.])
>>> apply_householder(np.array([5., 1., 2.]), np.array([2., 3., 4.]))
array([-5. , 1.6, 1.2])
"""
# ### Part (d) -- 3 pt
#
# Write a function `apply_householder_matrix` that applies the Householder
# transform defined by a vector $v$ to all the columns of a matrix $U$.
# You should **not** compute the Householder transform matrix $H$.
#
# **Do not use for loops.** Instead, you may find the numpy function `np.outer` useful.
# In[ ]:
def apply_householder_matrix(v, U):
"""Return the result of the Householder transformation defined
by the vector $v$ applied to all the vectors in the matrix U.
You should not compute the Householder matrix H directly.
Example:
>>> v = np.array([5., 1., 2.])
>>> U = np.array([[2., 2.],
[1., 3.],
[2., 4.]])
>>> apply_householder_matrix(v, U)
array([[-3. , -5. ],
[ 0. , 1.6],
[ 0. , 1.2]])
"""
# ### Part (e) -- 3 pt
#
# Write a function `solve_qr_householder` that takes an $m \times n$ matrix $A$ and a vector $b$,
# and solves the linear least squares problem $A x \approx b$ using Householder QR Factorization.
# You may use `np.linalg.solve` to solve any square system of the form $Ax = b$ that you produce.
#
# You should use the helper function `qr_householder` that takes a matrix A and a vector b and
# performs the Householder QR Factorization using the functions you wrote in parts (b-d).
# In[ ]:
def qr_householder(A, b):
"""Return the matrix [R O]^T, and vector [c1 c2]^T equivalent
to the system $Ax \approx b$. This algorithm is similar to
Algorithm 3.1 in the textbook.
"""
for k in range(A.shape[1]):
v = householder_v(A[k:, k])
if np.linalg.norm(v) != 0:
A[k:, k:] = apply_householder_matrix(v, A[k:, k:])
b[k:] = apply_householder(v, b[k:])
# now, A is upper-triangular
return A, b
def solve_qr_householder(A, b):
"""
Return the solution x to the linear least squares problem
$$Ax \approx b$$ using Householder QR decomposition.
Where A is an (m x n) matrix, with m > n, rank(A) = n, and
b is a vector of size (m)
"""
# ## Question 3
#
# For the next few questions, we will use the MNIST dataset for digit recognition
# Download the files `mnist_images.npy` and `mnist_labels.npy` from the course
# website, and place them into the same folder as your ipynb file.
#
# The code below loads the data, splits it into "train" and "test" sets,
# and plots the a subset of the data. We will use `train_images` and `train_labels`
# to set up Linear Least Squares problems. We will use `test_images` and `test_labels`
# to test the models that we build.
# In[ ]:
mnist_images = np.load("mnist_images.npy")
test_images = mnist_images[4500:] # 500 images
train_images = mnist_images[:4500] # 4500 images
mnist_labels = np.load("mnist_labels.npy")
test_labels = mnist_labels[4500:]
train_labels = mnist_labels[:4500]
def plot_mnist(remove_border=False):
""" Plot the first 40 data points in the MNIST train_images. """
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
for i in range(4 * 10):
plt.subplot(4, 10, i+1)
if remove_border:
plt.imshow(train_images[i,4:24,4:24])
else:
plt.imshow(train_images[i])# plot_mnist() # please comment out this line before submission
# ### Part (a) -- 1 pt
#
# How many examples of each digit are in `train_images`? You should use the
# information in `train_labels`.
#
# Some of the code in the later part of this
# question might be helpful. You might also find the `sum` function helpful.
#
# Save your result in the array `mnist_digits`, where
# `mnist_digits[0]` should contain the number of digit `0` in `train_images`,
# `mnist_digits[1]` should contain the number of digit `1` in `train_images`, etc.
# In[ ]:
mnist_digits = []
# ### Part (b) -- 2 pt
#
# We will build a rudimentary model to predict whether a digit is the
# digit `0`. Our features will be the intensity at each pixel.
# There are 28 * 28 = 784 pixels in each image. However, in order
# to obtain a matrix $A$ that is of full rank, we will ignore the
# pixels along the border. That is, we will only use 400 pixels
# in the center of the image.
#
# Look at a few of the MNIST images using the `plot_mnist` function
# written for you. Why would our matrix $A$ not be full rank if we
# use all 784 pixels in our model?
#
# If this question doesn't make sense yet, you might want to
# come back to it after completing the rest of this question.
#
# Include your solution in your PDF writeup.
# In[ ]:
# Plot the MNIST images, with only the center pixels that we will use
# in our digit classification model.
#plot_mnist(True) # please comment out this line before submission
# ### Part (c) -- 1 pt
#
# We will now build a rudimentary model to predict whether a digit is the
# digit `0`. To obtain a matrix of full rank, we will use the 400 pixels
# at the center of each image as features. Our target will
# be whether our digit is `0`.
#
# In short, the model we will build looks like this:
#
# $$x_1 p_1 + x_2 p_2 + ... + x_{400} p_{400} = y$$
#
# Where $p_i$ is the pixel intensity at pixel $i$ (the ordering of the pixel's
# doesn't actually matter), and the value of $y$ determines whether our digit is a `0` or not.
#
# We will solve for the coefficients $x_i$ by solving the linear
# least squares problem $Ax \approx b$, where $A$ is constructed using
# the pixel intensities of the images in `train_images`, and $y$ is constructed
# using the labels for those images. For convenience, we will set $y = 1$ for
# images of the digit `0`, and $y=0$ for other digits.
#
# **We should stress that in real machine learning courses,
# you will learn that this is not the proper way to build a digit detector.**
# However, digit detection is quite fun, so we might as well use to tools
# that we have to try and solve the problem.
#
# The code below obtains the matrices $A$ and the vector $b$ of our least squares
# problem, where $A$ is a $m \times n$ matrix and $b$ is a vector of length $m$.
#
# What is the value of $m$ and $n$? Save the values in the variables `mnist_m` and `mnist_n`.
# In[ ]:
A = train_images[:, 4:24, 4:24].reshape([-1, 20*20])
b = (train_labels == 0).astype(np.float32)
mnist_m = None
mnist_n = None
# ### Part (d) -- 1 pt
#
# Use the Householder QR decomposition method to solve the system.
# Save the result in the variable `mnist_x`.
# Save the norm of the residuals of this solution in the variable `mnist_r`.
# In[ ]:
mnist_x = None
mnist_r = None
# ### Part (e) -- 1 pt
#
# Consider `test_images[0]`. Is this image of the digit 0? Set the value of `test_image_0` to either `True` or `False`
# depending on your result.
#
# Let $p$ be the vector containing the values of the 400 center pixels in `test_image[0]`. The features are extracted
# for you in the variabel `p`. Use the solution `mnist_x` to estimate the target $y$ for the vector $p$.
# Save the (float) value of the predicted value of $y$ in the variable `test_image_0_y`.
# In[ ]:
# import matplotlib.pyplot as plt # Please comment before submitting
# plt.imshow(test_images[0]) # Please comment before submitting
p = test_images[0, 4:24, 4:24].reshape([-1, 20*20])
test_image_0 = None
test_image_0_y = None
# ### Part (f) -- 2 pt
#
# Write code to predict the value of $y$ for **every** image in `test_images`. Save
# your result in `test_image_y`.
#
# **Do not use a loop.**
# In[ ]:
test_image_y = None
# ### Part (g) -- 1 pt
#
# We will want to turn the continuous estimates of $y$ into discrete predictions
# about whether an image is of the digit 0.
#
# We will do this by selecting a **cutoff**. That is, we will predict that
# a test image is of the digit 0 if the prediction $y$ for that digit is at least $0.5$.
#
# Create a numpy array `test_image_pred` with `test_image_pred[i] == 1` if `test_image[i]`
# is of the digit 0, and `test_image_pred[i] == 0` otherwise. Then, run
# the code to compute the `test_accuracy`, or the portion of the times that
# our prediction matches the actual label.
#
# HINT: You might find the code in Part(c) helpful.
#
# (This is somewhat of an arbitrary cutoff. You will learn the proper way
# to do this prediction problem in a machine learning course like
# CSC411 or CSC321.)
# In[ ]:
test_image_pred = None
# test_accuracy = sum(test_image_pred == (test_labels == 0).astype(float)) / len(test_labels)
# print(test_accuracy)
# ### Part (h) -- 4 pt
#
# So far, we built a linear least squares model that determines whether an image is of the digit 0.
# Let's go a step further, and build such a model for **every** digit!
#
# Complete the function `mnist_classifiers` that uses `train_images` and `train_labels`,
# and uses the function `solve_normal_equation` to build a linear least squares model for
# every digit. The function should return a matrix $xs$ of shape $10 \times 400$,
# with `xs[0] == mnist_x1` from earlier.
#
# **Make sure to comment out any code you use to test `mnist_classifier`, or your code might not be testable.**
#
# This part of the code will be graded by your TA.
# In[ ]:
def mnist_classifiers():
"""Return the coefficients for linear least squares models for every digit.
Example:
>>> xs = mnist_classifiers()
>>> np.all(xs[0] == mnist_x1)
True
"""
# you can use train_images and train_labels here, and make
# whatever edits to this function as you wish.
A = train_images[:, 4:24, 4:24].reshape([-1, 20*20])
xs = []
# ...
return np.stack(xs)
# ### Just for fun...
#
# The code below makes predictions based on the result of your `mnist_classifier`.
# That is, for every test image, the code runs all 10 models to see whether the
# test image contains each of the 10 digits. We make a discrete prediction about
# which digit the image contains by looking at which model yields the **largest**
# value of $y$ for the image.
#
# The code then compares the result against the actual labels, computes the
# accuracy measure: the fraction of predictions that is correct. Just for fun,
# look at the prediction accuracy of our model, but please comment any code you
# write before submitting your assignment.
#
# Again, in machine learning and statistics courses you will learn
# ways to classifying digits that are better and more principled.
# You'll achieve a much better test accuracy than what we have here.
# In[ ]:
def prediction_accuracy(xs):
"""Return the prediction
"""
testA = test_images[:, 4:24, 4:24].reshape([-1, 20*20])
ys = np.matmul(testA, xs.T)
pred = np.argmax(ys, axis=1)
return sum(pred == test_labels) / len(test_labels)