Create and save the file p3.R
on your computer. The file p3.R
, which should contain all the required functions, should be submitted on Blackboard once you have completed the required functions. You must work with one partner. You should work collaboratively. Both partners are responsible for being able to explain the work that’s been done to the preceptor. Additionally, work submitted on Blackboard will be graded for correctness. Only one of the partners should submit the code on Blackboard.
The file p3.R
should have include the following code, with the NetIDs of the two partners substituted in:
student.netid.1 <- "netid1"
student.netid.2 <- "netid2"
assignment <- "precept3"
In addition, a comment should contain the full names of the partners.
Some of you will be tempted to use for
and while
-loops to solve some of the problems below (if you’ve used those before). Please don’t do this – the goal here is to try to use R the way professional data scientists use it, which usually means no loops.
Write a function called MostAccuratePred
that takes in a dataset in the same format as gapminder %>% filter(year == 1982)
(i.e., the year is always the same), and finds the 10 countries for which the predictions based on log(gdpPercap)
are the most accurate.
Review the prediction errors (what is a good systematic way of doing that?). Do you notice any patterns? Discuss your hypotheses about the patterns, if any, with the preceptors.
In this problem, you will write a function that finds good coefficients for linear regression.
Write a function called my.lm
that could be used to find the coefficient in Simple Linear Regression. The function could be used like this
my.data <- data.frame(X = c(1, 2, 3),
Y = c(3.1, 4.9, 7.05))
my.lm(my.data, intercept) # Returns approximately 2, since Y ~ 2*X + 1
The function should work as follows. First, generate possible values for the coefficient using
seq(-5, 5, 0.1)
## [1] -5.0 -4.9 -4.8 -4.7 -4.6 -4.5 -4.4 -4.3 -4.2 -4.1 -4.0 -3.9 -3.8 -3.7
## [15] -3.6 -3.5 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3
## [29] -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9
## [43] -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5
## [57] 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
## [71] 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3
## [85] 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7
## [99] 4.8 4.9 5.0
Then, of those possible coefficients, find the one that produces the smallest error.
Try different inputs, and make sure that the answer you are getting is close to the answer you would expect. Explain to your preceptor how you came up with the different inputs.
Now, write a function that finds both a good intercept and a good coefficient. Hint: use a modification of the function in 2(a) that returns both the coefficient and the sum of squared errors in produces. Then repeatedly use that function for every possible intercept hypothesis.
In class, we ran the following:
fit <- lm(gdpPercap ~ continent, data = gapminder)
fit
##
## Call:
## lm(formula = gdpPercap ~ continent, data = gapminder)
##
## Coefficients:
## (Intercept) continentAmericas continentAsia
## 2194 4942 5708
## continentEurope continentOceania
## 12276 16428
fit$coefficients
## (Intercept) continentAmericas continentAsia continentEurope
## 2193.755 4942.356 5708.396 12275.721
## continentOceania
## 16427.855
Write a function called predictGdpCont
that takes in a vector like fit$coefficients
and the name of a continent, and returns the prediction for that continent. Inside the function, you may not refer to gapminder
or lm
.
For example, the following should run:
fit <- lm(gdpPercap ~ continent, data = gapminder)
predictGdpCont(fit$coefficients, "Asia") # Returns the prediction for Asia
Make sure that the numbers you return correspond to what predict
computes. You may assume the order of the coefficients will always be
"(Intercept)" "continentAmericas" "continentAsia" "continentEurope" "continentOceania"