{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Assignment 3\n",
"\n",
"Due Date: March 10, 8:59pm\n",
"\n",
"Please see the guidelines at https://www.cs.toronto.edu/~lczhang/338/homework.html\n",
"\n",
"\n",
"### What to Hand In\n",
"\n",
"Please hand in 2 files:\n",
"\n",
"- Python File containing all your code, named `csc338_a3.py`. If you are using Jupyter Notebook to complete the work, your notebook can be exported as a .py file (File -> Download As -> Python). Your code will be auto-graded using Python 3.6, so please make sure that your code runs. There will be a 20% penalty if you need a remark due to small issues that renders your code untestable.\n",
"- PDF file named `csc338_a3_written.pdf` containing your solutions to the written parts of the assignment. Your solution can be hand-written, but must be legible. Graders may deduct marks for illegible or poorly presented solutions.\n",
"\n",
"Submit the assignment on **MarkUs** by 9pm on the due date. See the syllabus for the course policy regarding late assignments. All assignments must be done individually."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import math\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Q1. Positive Definite Matrices [10 pt]\n",
"\n",
"### Part (a) [4 pt]\n",
"\n",
"Are the following matrices symmetric positive definite? Save each answer (`True` or `False`) in the variables `M1_pd`, ... `M4_pd`.\n",
"\n",
"You should be able to do this question by hand.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"M1 = np.eye(5)\n",
"M2 = np.array([[-3., 1.],\n",
" [ 1., 3.]])\n",
"M3 = np.array([[3., 1.],\n",
" [1., 3.]])\n",
"M4 = np.array([[3., 1.],\n",
" [1., 3.],\n",
" [0., 0.]])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"M1_pd = False\n",
"M2_pd = False\n",
"M3_pd = False\n",
"M4_pd = False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) [6 pt]\n",
"\n",
"Suppose that $A$ is a symmetric positive definite matrix. Show that the function\n",
"\n",
"$$ ||x||_A = (x^T A x)^{\\frac{1}{2}} $$\n",
"\n",
"on a vector $x$ satisfies the three properties of a vector norm.\n",
"\n",
"Include your solution in your pdf writeup."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Q2. Cholesky Factorization [10 pt]\n",
"\n",
"Write a function `cholesky_factorize` that returns the Cholesky factorization of a matrix, according to its docstring"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def cholesky_factorize(A):\n",
" \"\"\"Return the Cholesky Factorization L of A, where\n",
" * A is an nxn symmetric, positive definite matrix\n",
" * L is lower triangular, with positive diagonal entries\n",
" * $A = LL^T$\n",
" \n",
" >>> M = np.array([[8., 3., 2.],\n",
" [3., 5., 1.],\n",
" [2., 1., 3.]])\n",
" >>> L = cholesky_factorize(M.copy())\n",
" >>> np.matmul(L, L.T)\n",
" array([[8., 3., 2.],\n",
" [3., 5., 1.],\n",
" [2., 1., 3.]])\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Q3. Rank-k Update Woodbury Formula [12 pt]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (a) [6 pt]\n",
"\n",
"Recall that we proved the Sherman-Morrison formula in class:\n",
" \n",
"$$(A - uv^T)^{-1} = A^{-1} + A^{-1}u(1-v^TA^{-1}u)^{-1}v^T A^{-1}$$\n",
"\n",
"A related formula is the Woodbury formula for a rank-k update of the matrix $A$. Specifically, if $U$ and $V$ are $n \\times k$ matrices, then\n",
"\n",
"$$(A - UV^T)^{-1} = A^{-1} + A^{-1}U(I-V^TA^{-1}U)^{-1}V^T A^{-1}$$\n",
"\n",
"Prove that the Woodbury formula is correct. Include your solution in your pdf writeup."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) [6 pt]\n",
"\n",
"A rank 1 matrix has the form $xy^T$ where $x$ and $y$ are column vectors.\n",
"Suppose $A$ and $B$ are non-sigular matrices. Show that $A - B$ is rank 1\n",
"if and only if $A^{-1} - B^{-1}$ is also rank 1. \n",
"\n",
"Include your solution in your pdf writeup.\n",
"\n",
"Hint: Use the Sherman-Morrison formula. This question is not supposed to be easy, so leave aside time to think!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Q4. Iterative Refinement [6 pt]\n",
"\n",
"For this question, we will use the code from assignment 2."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def backward_substitution(A, b):\n",
" \"\"\"Return a vector x with np.matmul(A, x) == b, where \n",
" * A is an nxn numpy matrix that is upper-triangular and nonsingular\n",
" * b is an nx1 numpy vector\n",
" \"\"\"\n",
" n = A.shape[0]\n",
" x = np.zeros_like(b, dtype=np.float)\n",
" for i in range(n-1, -1, -1):\n",
" s = 0\n",
" for j in range(n-1, i, -1):\n",
" s += A[i,j] * x[j]\n",
" x[i] = (b[i] - s) / A[i,i]\n",
" return x\n",
"\n",
"def eliminate(A, b, k):\n",
" \"\"\"Eliminate the k-th row of A, in the system np.matmul(A, x) == b,\n",
" so that A[i, k] = 0 for i < k. The elimination is done in place.\"\"\"\n",
" n = A.shape[0]\n",
" for i in range(k + 1, n):\n",
" m = A[i, k] / A[k, k]\n",
" for j in range(k, n):\n",
" A[i, j] = A[i, j] - m * A[k, j]\n",
" b[i] = b[i] - m * b[k]\n",
"\n",
"def gauss_elimination(A, b):\n",
" \"\"\"Return a vector x with np.matmul(A, x) == b using\n",
" the Gauss Elimination algorithm, without partial pivoting.\"\"\"\n",
" for k in range(A.shape[0] - 1):\n",
" eliminate(A, b, k)\n",
" x = backward_substitution(A, b)\n",
" return x\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (a) [1 pt]\n",
"\n",
"Recall the following system for question 2(d) of assignment 2.\n",
"\n",
"When we tried to solve this system using gauss elimiation without pivoting, we obtained an answer `q4_x0` below that is incorrect. Compute the residual of `q4_x0`. Save the residual in the variable `q4_r0`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"e = pow(2, -100)\n",
"A = np.array([[e, 1],\n",
" [1, 1]])\n",
"b = np.array([1+e, 2])\n",
"\n",
"q4_x0 = gauss_elimination(A.copy(), b.copy())\n",
"q4_r0 = np.zeros(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) [4 pt]\n",
"\n",
"We will use iterative refinement to produce a better solution `q4_x1` to the system $Ax = b$. \n",
"First, what is the new system that we need to solve? Save the $A$ and $b$ of the new system\n",
"in the variable `q4_A1` and `q4_b1`. \n",
"Compute the residual `q4_r1` of your new solution."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"q4_A1 = np.zeros([2,2]) # todo\n",
"q4_b1 = np.zeros(2) # todo\n",
"# gauss_elimination(q4_A1, q4_b1) \n",
"q4_x1 = np.zeros(2)\n",
"q4_r1 = np.zeros(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c) [1 pt]\n",
"\n",
"How does your new solution `q4_x1` compare to your old solution `q4_x0`?\n",
"\n",
"Store either \"better\" or \"worse\" in the variable `new_solution_is`, depending on whether you think `q4_x1` is a better or worse solution than `q4_x0` to the initial system $Ax = b$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"new_solution_is = None # either \"better\" or \"worse\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Q5. Normal Equation [10 pt]\n",
"\n",
"### Part (a) [5 pt]\n",
"\n",
"Write a function `solve_normal_equation` that takes an $m \\times n$ matrix $A$ and a vector $b$,\n",
"and solves the linear least squares problem $A x \\approx b$ using the Normal Equation method.\n",
"\n",
"You should use the `cholesky_factorize` method that you wrote in question 3. You may need\n",
"other functions that you wrote in assignment 2."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def solve_normal_equation(A, b):\n",
" \"\"\"\n",
" Return the solution x to the linear least squares problem\n",
" $$Ax \\approx b$$ using normal equations,\n",
" where A is an (m x n) matrix, with m > n, rank(A) = n, and\n",
" b is a vector of size (m)\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) [2 pt]\n",
"\n",
"Consider the linear least square system below, where $A$ is a $50 \\times 3$ matrix,\n",
"and $b$ is a vector of size $50$. The system is derived from the \"iris\" dataset.\n",
"\n",
"Use the `solve_normal_equation` function you wrote in part (a) to find the vector `x`\n",
"that minimizes the norm of `matmul(A, x) - b`.\n",
"\n",
"Store the solution in the variable `q5b_x` and the residual in `q5b_r`. You may use\n",
"the function `np.linalg.norm`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"irisA = np.array([[4.8, 6.3, 5.7, 5.1, 7.7, 5.6, 4.9, 4.4, 6.4, 6.2, 6.7, 4.5, 6.3,\n",
" 4.8, 5.8, 4.7, 5.4, 7.4, 6.4, 6.3, 5.1, 5.7, 6.5, 5.5, 7.2, 6.9,\n",
" 6.8, 6. , 5. , 5.4, 5.6, 6.1, 5.9, 5.6, 6. , 6. , 4.4, 6.9, 6.7,\n",
" 5.1, 6. , 6.3, 4.6, 6.7, 5. , 6.7, 5.8, 5.1, 5.2, 6.1],\n",
" [3.1, 2.5, 3. , 3.7, 3.8, 3. , 3.1, 2.9, 2.9, 2.9, 3.1, 2.3, 2.5,\n",
" 3.4, 2.7, 3.2, 3.7, 2.8, 2.8, 3.3, 2.5, 4.4, 3. , 2.6, 3.2, 3.2,\n",
" 3. , 2.9, 3.6, 3.9, 3. , 2.6, 3.2, 2.8, 2.7, 2.2, 3.2, 3.1, 3.1,\n",
" 3.8, 2.2, 2.7, 3.1, 3. , 3.5, 2.5, 4. , 3.5, 3.4, 3. ],\n",
" [1.6, 4.9, 4.2, 1.5, 6.7, 4.5, 1.5, 1.4, 4.3, 4.3, 5.6, 1.3, 5. ,\n",
" 1.6, 5.1, 1.3, 1.5, 6.1, 5.6, 6. , 3. , 1.5, 5.8, 4.4, 6. , 5.7,\n",
" 5.5, 4.5, 1.4, 1.3, 4.1, 5.6, 4.8, 4.9, 5.1, 5. , 1.3, 4.9, 4.4,\n",
" 1.5, 4. , 4.9, 1.5, 5.2, 1.3, 5.8, 1.2, 1.4, 1.4, 4.9]])\n",
"irisb = np.array([0.2, 1.5, 1.2, 0.4, 2.2, 1.5, 0.1, 0.2, 1.3, 1.3, 2.4, 0.3, 1.9,\n",
" 0.2, 1.9, 0.2, 0.2, 1.9, 2.1, 2.5, 1.1, 0.4, 2.2, 1.2, 1.8, 2.3,\n",
" 2.1, 1.5, 0.2, 0.4, 1.3, 1.4, 1.8, 2. , 1.6, 1.5, 0.2, 1.5, 1.4,\n",
" 0.3, 1. , 1.8, 0.2, 2.3, 0.3, 1.8, 0.2, 0.2, 0.2, 1.8])\n",
"A = irisA.T # transpose\n",
"b = irisb"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"q5b_x = None\n",
"q5b_r = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c) [1 pt]\n",
"\n",
"Consider an alternative value of $x$ for the system $Ax \\approx b$\n",
"defined in part (b), with `x = np.array([-0.2, 0.2, 0.5])`.\n",
"Compute the norm of the residual $Ax - b$ for this alternate value of $x$,\n",
"and save the result in the variable `q5c_r`.\n",
"\n",
"You should see that `q5b_r < q5c_r`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.array([-0.2, 0.2, 0.5])\n",
"q5c_r = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (d) [2 pt]\n",
"\n",
"Show that if an $m \\times n$ matrix $A$ with $m > n$ has $rank(n)$, then $A^T A$ is positive definite."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Q6. Householder Transforms [17 pt]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (a) [2 pt]\n",
"\n",
"Consider the following linear least squares problem. What is the minimum\n",
"value of $||Ax - b||_2$? Save the minimum norm of the \n",
"residual in the variable `q5a_r`.\n",
"\n",
"What is the solution to the problem? Save the value in the variable `q5a_x`.\n",
"\n",
"Do this entire problem by hand. You should verify that the residual matches\n",
"the value you stored in `q5_r`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A = np.array([[1., 2., 3.],\n",
" [0., 3., 4.],\n",
" [0., 0., 6.],\n",
" [0., 0., 0.],\n",
" [0., 0., 0.]])\n",
"b = np.array([[1., 2., 3., 4., 5.]])\n",
"\n",
"q5a_r = 0\n",
"q5a_x = np.array([0., 0., 0.])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) [5 pt]\n",
"\n",
"Write a function `householder_v` that returns the vector $v$ that defines\n",
"the Householder transform\n",
"\n",
"$$H = I - 2 \\frac{v v^T} {v^T v}$$\n",
"\n",
"that eliminates all but the first element of a vector $a$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def householder_v(a):\n",
" \"\"\"Return the vector $v$ that defines the Householder Transform\n",
" H = I - 2 np.matmul(v, v.T) / np.matmul(v.T, v)\n",
" that will eliminate all but the first element of the \n",
" input vector a.\n",
" \n",
" Example:\n",
" >>> householder_v(np.array([2., 1., 2.]))\n",
" array([5., 1., 2.])\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c) [3 pt]\n",
"\n",
"Write a function `apply_householder` that applies the Householder\n",
"transform defined by a vector $v$ to a vector $u$. You should **not**\n",
"compute the Householder transform matrix $H$. You should only \n",
"need to compute vector-vector dot products and vector-scalar multiplications."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def apply_householder(v, u):\n",
" \"\"\"Return the result of the Householder transformation defined\n",
" by the vector $v$ applied to the vector $u$. You should not\n",
" compute the Householder matrix H directly.\n",
" \n",
" Example:\n",
" \n",
" >>> apply_householder(np.array([5., 1., 2.]), np.array([2., 1., 2.]))\n",
" array([-3., 0., 0.])\n",
" >>> apply_householder(np.array([5., 1., 2.]), np.array([2., 3., 4.]))\n",
" array([-5. , 1.6, 1.2])\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (d) [3 pt]\n",
"\n",
"Write a function `apply_householder_matrix` that applies the Householder\n",
"transform defined by a vector $v$ to all the columns of a matrix $U$. \n",
"You should **not** compute the Householder transform matrix $H$. \n",
"\n",
"**Do not use for loops.** Instead, you may find the numpy function `np.outer` useful."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def apply_householder_matrix(v, U):\n",
" \"\"\"Return the result of the Householder transformation defined\n",
" by the vector $v$ applied to all the vectors in the matrix U.\n",
" You should not compute the Householder matrix H directly.\n",
" \n",
" Example:\n",
" \n",
" >>> v = np.array([5., 1., 2.])\n",
" >>> U = np.array([[2., 2.],\n",
" [1., 3.], \n",
" [2., 4.]])\n",
" >>> apply_householder_matrix(v, U)\n",
" array([[-3. , -5. ],\n",
" [ 0. , 1.6],\n",
" [ 0. , 1.2]])\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (e) [4 pt]\n",
"\n",
"Write a function `solve_qr_householder` that takes an $m \\times n$ matrix $A$ and a vector $b$,\n",
"and solves the linear least squares problem $A x \\approx b$ using Householder QR Factorization.\n",
"You may use `np.linalg.solve` to solve any square system of the form $Ax = b$ that you produce.\n",
"\n",
"You should use the helper function `qr_householder` that takes a matrix A and a vector b and\n",
"performs the Householder QR Factorization using the functions you wrote in parts (b-d)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def qr_householder(A, b):\n",
" \"\"\"Return the matrix [R O]^T, and vector [c1 c2]^T equivalent\n",
" to the system $Ax \\approx b$. This algorithm is similar to\n",
" Algorithm 3.1 in the textbook.\n",
" \"\"\"\n",
" for k in range(A.shape[1]):\n",
" v = householder_v(A[k:, k])\n",
" if np.linalg.norm(v) != 0:\n",
" A[k:, k:] = apply_householder_matrix(v, A[k:, k:])\n",
" b[k:] = apply_householder(v, b[k:])\n",
" # now, A is upper-triangular\n",
" return A, b # A is uppe\n",
"\n",
"def solve_qr_householder(A, b):\n",
" \"\"\"\n",
" Return the solution x to the linear least squares problem\n",
" $$Ax \\approx b$$ using Householder QR decomposition.\n",
" Where A is an (m x n) matrix, with m > n, rank(A) = n, and\n",
" b is a vector of size (m)\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (f) [2 pt]\n",
"\n",
"Use the function `solve_qr_householder` to solve the Iris system from Q5(b).\n",
"\n",
"Save your result for $x$ in the variable `q6f_x`. \n",
"\n",
"Do you get the same result or a different result from Q5(b)? Change the value\n",
"of the variable `q6f_compare` to either the string \"same\" or \"different\" depending on\n",
"whether your results were the same or different."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A = irisA.T # transpose\n",
"b = irisb\n",
"q6f_x = None\n",
"q6f_compare = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (g) [4 pt]\n",
"\n",
"Show that for $H = I - 2 \\frac{v v^T}{v^T v}$, we have $H = H^T = H^{-1}$.\n",
"\n",
"Include your solution in your PDF writeup."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Q7. MNIST Digit Recognition [15pt]\n",
"\n",
"For the next few questions, we will use the MNIST dataset for digit recognition\n",
"Download the files `mnist_images.npy` and `mnist_labels.npy` from the course\n",
"website, and place them into the same folder as your ipynb file.\n",
"\n",
"The code below loads the data, splits it into \"train\" and \"test\" sets,\n",
"and plots the a subset of the data. We will use `train_images` and `train_labels`\n",
"to set up Linear Least Squares problems. We will use `test_images` and `test_labels`\n",
"to test the models that we build."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mnist_images = np.load(\"mnist_images.npy\")\n",
"test_images = mnist_images[4500:] # 500 images\n",
"train_images = mnist_images[:4500] # 4500 images\n",
"mnist_labels = np.load(\"mnist_labels.npy\")\n",
"test_labels = mnist_labels[4500:]\n",
"train_labels = mnist_labels[:4500]\n",
"\n",
"def plot_mnist(remove_border=False):\n",
" \"\"\" Plot the first 40 data points in the MNIST train_images. \"\"\"\n",
" import matplotlib.pyplot as plt\n",
" plt.figure(figsize=(10, 5))\n",
" for i in range(4 * 10):\n",
" plt.subplot(4, 10, i+1)\n",
" if remove_border:\n",
" plt.imshow(train_images[i,4:24,4:24])\n",
" else:\n",
" plt.imshow(train_images[i])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# plot_mnist() # please comment out this line before submission"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (a) [2 pt]\n",
"\n",
"How many examples of each digit are in `train_images`? You should use the\n",
"information in `train_labels`. \n",
"\n",
"If you are not well versed in numpy, some of the code in the later part of this\n",
"question might be helpful. You might also find the `sum` function helpful.\n",
"\n",
"Save your result in the array `mnist_digits`, where \n",
"`mnist_digits[0]` should contain the number of digit `0` in `train_images`, \n",
"`mnist_digits[1]` should contain the number of digit `1` in `train_images`, etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mnist_digits = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (b) [2 pt]\n",
"\n",
"We will build a rudimentary model to predict whether a digit is the\n",
"digit `0`. Our features will be the intensity at each pixel.\n",
"There are 28 * 28 = 784 pixels in each image. However, in order\n",
"to obtain a matrix $A$ that is of full rank, we will ignore the\n",
"pixels along the border. That is, we will only use 400 pixels\n",
"in the center of the image.\n",
"\n",
"Look at a few of the MNIST images using the `plot_mnist` function\n",
"written for you. Why would our matrix $A$ not be full rank if we \n",
"use all 784 pixels in our model?\n",
"\n",
"If this question doesn't make sense yet, you might want to\n",
"come back to it after completing the rest of this question. \n",
"\n",
"Include your solution in your PDF writeup."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plot the MNIST images, with only the center pixels that we will use\n",
"# in our digit classification model.\n",
"#plot_mnist(True) # please comment out this line before submission"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (c) [1 pt]\n",
"\n",
"We will now build a rudimentary model to predict whether a digit is the\n",
"digit `0`. To obtain a matrix of full rank, we will use the 400 pixels\n",
"at the center of each image as features. Our target will\n",
"be whether our digit is `0`.\n",
"\n",
"In short, the model we will build looks like this:\n",
"\n",
"$$x_1 p_1 + x_2 p_2 + ... + x_{400} p_{400} = y$$\n",
"\n",
"Where $p_i$ is the pixel intensity at pixel $i$ (the ordering of the pixel's\n",
"doesn't actually matter), and the value of $y$ determines whether our digit is a `0` or not.\n",
"\n",
"We will solve for the coefficients $x_i$ by solving the linear\n",
"least squares problem $Ax \\approx b$, where $A$ is constructed using\n",
"the pixel intensities of the images in `train_images`, and $y$ is constructed\n",
"using the labels for those images. For convenience, we will set $y = 1$ for\n",
"images of the digit `0`, and $y=0$ for other digits.\n",
"\n",
"**I should stress that in CSC411, you will learn that this is not the\n",
"proper way to build a digit detector.** However, digit detection is quite fun,\n",
"so we might as well use to tools that we have to try and solve the problem.\n",
"\n",
"The code below obtains the matrices $A$ and the vector $b$ of our least squares\n",
"problem, where $A$ is a $m \\times n$ matrix and $b$ is a vector of length $m$.\n",
"\n",
"What is the value of $m$ and $n$? Save the values in the variables `mnist_m` and `mnist_n`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A = train_images[:, 4:24, 4:24].reshape([-1, 20*20])\n",
"b = (train_labels == 0).astype(np.float32)\n",
"\n",
"mnist_m = None\n",
"mnist_n = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (d) [2 pt]\n",
"\n",
"Use the Normal Equations method to solve the system $Ax=b$. Save the result in the variable `mnist_x1`. Save the norm of the residuals of this solution in the variable `mnist_r1`.\n",
"\n",
"Then, use the Householder QR decomposition method to solve the same system. Save the result in the variable `mnist_x2`. Save the norm of the residuals of this solution in the variable `mnist_r2`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"mnist_x1 = None\n",
"mnist_r1 = None\n",
"mnist_x2 = None\n",
"mnist_r2 = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (e) [1 pt]\n",
"\n",
"Consider `test_images[0]`. Is this image of the digit 0? Set the value of `test_image_0` to either `True` or `False`\n",
"depending on your result.\n",
"\n",
"Let $p$ be the vector containing the values of the 400 center pixels in `test_image[0]`. The features are extracted\n",
"for you in the variabel `p`. Use the solution `mnist_x1` to estimate the target $y$ for the vector $p$.\n",
"Save the (float) value of the predicted value of $y$ in the variable `test_image_0_y`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# import matplotlib.pyplot as plt # Please comment before submitting\n",
"# plt.imshow(test_images[0]) # Please comment before submitting"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p = test_images[0, 4:24, 4:24].reshape([-1, 20*20])\n",
"test_image_0 = None\n",
"test_image_0_y = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (f) [2 pt]\n",
"\n",
"Write code to predict the value of $y$ for **every** image in `test_images`. Save\n",
"your result in `test_image_y`.\n",
"\n",
"**Do not use a loop.** "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_image_y = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (g) [2 pt]\n",
"\n",
"We will want to turn the continuous estimates of $y$ into discrete predictions\n",
"about whether an image is of the digit 0.\n",
"\n",
"We will do this by selecting a **cutoff**. That is, we will predict that\n",
"a test image is of the digit 0 if the prediction $y$ for that digit is at least $0.5$.\n",
"\n",
"Create a numpy array `test_image_pred` with `test_image_pred[i] == 1` if `test_image[i]`\n",
"is of the digit 0, and `test_image_pred[i] == 0` otherwise. Then, run\n",
"the code to compute the `test_accuracy`, or the portion of the times that\n",
"our prediction matches the actual label.\n",
"\n",
"HINT: You might find the code in Part(c) helpful.\n",
"\n",
"(This is somewhat of an arbitrary cutoff. You will learn the proper way\n",
" to do this prediction problem in CSC411.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_image_pred = None"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_accuracy = sum(test_image_pred == (test_labels == 0).astype(float)) / len(test_labels)\n",
"test_accuracy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part (h) [3 pt]\n",
"\n",
"So far, we built a linear least squares model that determines whether an image is of the digit 0.\n",
"Let's go a step further, and build such a model for **every** digit!\n",
"\n",
"Complete the function `mnist_classifiers` that uses `train_images` and `train_labels`,\n",
"and uses the function `solve_normal_equation` to build a linear least squares model for\n",
"every digit. The function should return a matrix $xs$ of shape $10 \\times 400$,\n",
"with `xs[0] == mnist_x1` from earlier.\n",
"\n",
"Please comment out any code you use to test `mnist_classifier`. It will help make testing\n",
"your code faster, so I can get your assignments back to you more quickly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def mnist_classifiers():\n",
" \"\"\"Return the coefficients for linear least squares models for every digit.\n",
" \n",
" Example:\n",
" >>> xs = mnist_classifiers()\n",
" >>> np.all(xs[0] == mnist_x1)\n",
" True\n",
" \"\"\"\n",
" # you can use train_images and train_labels here, and make\n",
" # whatever edits to this function as you wish.\n",
" A = train_images[:, 4:24, 4:24].reshape([-1, 20*20])\n",
" xs = []\n",
" # ...\n",
" return np.stack(xs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Just for fun...\n",
"\n",
"The code below makes predictions based on the result of your `mnist_classifier`.\n",
"That is, for every test image, the code runs all 10 models to see whether the\n",
"test image contains each of the 10 digits. We make a discrete prediction about\n",
"which digit the image contains by looking at which model yields the **largest**\n",
"value of $y$ for the image.\n",
"\n",
"The code then compares the result against the actual labels, computes the\n",
"accuracy measure: the fraction of predictions that is correct. Just for fun,\n",
"look at the prediction accuracy of our model, but please comment any code you\n",
"write before submitting your assignment.\n",
"\n",
"Again, in CSC411 you will learn much better ways of classifying digits.\n",
"You will achieve a much better test accuracy than what we have here."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def prediction_accuracy(xs):\n",
" \"\"\"Return the prediction \n",
" \"\"\"\n",
" testA = test_images[:, 4:24, 4:24].reshape([-1, 20*20])\n",
" ys = np.matmul(testA, xs.T)\n",
" pred = np.argmax(ys, axis=1)\n",
" return sum(pred == test_labels) / len(test_labels)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}