## Linear Regression with Gradient Descent¶

We'll consider the data on the radial velocity of Galaxy NGC7531

http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/galaxy.info.txt

SUMMARY:
The galaxy data frame records the  radial  velocity  of  a
spiral  galaxy  measured  at 323 points in the area of sky
which it covers.  All the measurements  lie  within  seven
slots  crossing at the origin.  The positions of the meas-
urements given by four variables (columns).

DATA DESCRIPTION:
east.west:     the east-west coordinate.  The origin,  (0,0),  is
near  the  center of the galaxy, east is negative, west is
positive.
north.south:   the north-south coordinate.  The origin, (0,0), is
near the center of the galaxy, south is negative, north is
positive.
angle:    degrees of counter-clockwise rotation from the horizon-
tal of the slot within which the observation lies.
radial.position:    signed  distance  from  origin;  negative  if
east-west coordinate is negative.
velocity:   radial velocity measured in km/sec.

SOURCE:
Buta, R. (1987)  The  Structure  and  Dynamics  of  Ringed
Galaxies,  III:  Surface  Photometry and Kinematics of the
Ringed Nonbarred Spiral NGC7531.    The  Astrophysical  J.
Supplement Ser. Vol. 64, pp. 1--37.

John M. Chambers and Trevor J. Hastie, (eds.)  Statistical
Models in S, Wadsworth and Brooks, Pacific Grove, CA 1992,
pg. 352.
In [1]:
import pandas as pd
from numpy import *
from numpy.linalg import norm

x1 = dat.loc[:,"east.west"].as_matrix()
x2 = dat.loc[:, "north.south"].as_matrix()
y = dat.loc[:, "velocity"].as_matrix()


Now, let's write the cost function and the gradient of the cost function.

In [2]:
def f(x, y, theta):
x = vstack( (ones((1, x.shape[1])), x))
return sum( (y - dot(theta.T,x)) ** 2)

def df(x, y, theta):
x = vstack( (ones((1, x.shape[1])), x))
return -2*sum((y-dot(theta.T, x))*x, 1)



The gradient descent algorithm is the same:

In [3]:
def grad_descent(f, df, x, y, init_t, alpha):
EPS = 1e-5   #EPS = 10**(-5)
prev_t = init_t-10*EPS
t = init_t.copy()
max_iter = 30000
iter  = 0
while norm(t - prev_t) >  EPS and iter < max_iter:
prev_t = t.copy()
t -= alpha*df(x, y, t)
if iter % 500 == 0:
print "Iter", iter
print "x = (%.2f, %.2f, %.2f), f(x) = %.2f" % (t[0], t[1], t[2], f(x, y, t))
print "Gradient: ", df(x, y, t), "\n"
iter += 1
return t


Let's check the gradient to make sure it works:

In [4]:
x = vstack((x1, x2))
theta = array([-3, 2, 1])

h = 0.000001
print (f(x, y, theta+array([0, h, 0])) - f(x, y, theta-array([0, h, 0])))/(2*h)
print df(x, y, theta)

217161.297798
[-1030866.79673168   217161.22653259   -29883.8912257 ]

In [5]:
x = vstack((x1, x2))
theta0 = array([0., 0., 0.])
theta = grad_descent(f, df, x, y, theta0, 0.0000010)

Iter 0
x = (1.03, -0.05, 0.32), f(x) = 822017775.24

Iter 500
x = (440.65, 0.38, -0.29), f(x) = 432219704.26

Iter 1000
x = (759.38, 0.92, -1.19), f(x) = 227355448.41

Iter 1500
x = (990.47, 1.30, -1.83), f(x) = 119666065.36

Iter 2000
x = (1158.01, 1.58, -2.30), f(x) = 63057833.85

Iter 2500
x = (1279.49, 1.79, -2.64), f(x) = 33301029.76

Iter 3000
x = (1367.56, 1.93, -2.89), f(x) = 17659004.46

Iter 3500
x = (1431.41, 2.04, -3.07), f(x) = 9436583.97

Iter 4000
x = (1477.71, 2.12, -3.20), f(x) = 5114368.82

Iter 4500
x = (1511.28, 2.17, -3.29), f(x) = 2842343.96

Iter 5000
x = (1535.61, 2.21, -3.36), f(x) = 1648026.51

Iter 5500
x = (1553.26, 2.24, -3.41), f(x) = 1020219.04

Iter 6000
x = (1566.05, 2.26, -3.44), f(x) = 690204.43

Iter 6500
x = (1575.33, 2.28, -3.47), f(x) = 516728.24

Iter 7000
x = (1582.05, 2.29, -3.49), f(x) = 425538.37

Iter 7500
x = (1586.93, 2.30, -3.50), f(x) = 377603.31

Iter 8000
x = (1590.46, 2.31, -3.51), f(x) = 352405.67

Iter 8500
x = (1593.02, 2.31, -3.52), f(x) = 339160.22

Iter 9000
x = (1594.88, 2.31, -3.53), f(x) = 332197.59

Iter 9500
x = (1596.23, 2.32, -3.53), f(x) = 328537.60

Iter 10000
x = (1597.21, 2.32, -3.53), f(x) = 326613.68

Iter 10500
x = (1597.91, 2.32, -3.53), f(x) = 325602.35

Iter 11000
x = (1598.43, 2.32, -3.54), f(x) = 325070.73

Iter 11500
x = (1598.80, 2.32, -3.54), f(x) = 324791.28

Iter 12000
x = (1599.07, 2.32, -3.54), f(x) = 324644.38

Iter 12500
x = (1599.26, 2.32, -3.54), f(x) = 324567.16

Iter 13000
x = (1599.41, 2.32, -3.54), f(x) = 324526.57

Iter 13500
x = (1599.51, 2.32, -3.54), f(x) = 324505.23

Iter 14000
x = (1599.58, 2.32, -3.54), f(x) = 324494.02

Iter 14500
x = (1599.64, 2.32, -3.54), f(x) = 324488.12

Iter 15000
x = (1599.68, 2.32, -3.54), f(x) = 324485.02

Iter 15500
x = (1599.71, 2.32, -3.54), f(x) = 324483.39

Iter 16000
x = (1599.73, 2.32, -3.54), f(x) = 324482.54

Iter 16500
x = (1599.74, 2.32, -3.54), f(x) = 324482.09

Iter 17000
x = (1599.75, 2.32, -3.54), f(x) = 324481.85

Iter 17500
x = (1599.76, 2.32, -3.54), f(x) = 324481.72

x = vstack((x1, x2))

array([ 1599.7805884 ,     2.32128786,    -3.53935822])