--- title: "Precept 3 Problem Set" output: html_document: df_print: paged --- ```{r setup, include=FALSE} knitr::opts_chunk\$set(echo = TRUE) library(gapminder) ``` Create and save the file `p3.R` on your computer. The file `p3.R`, which should contain all the required functions, should be submitted on Blackboard once you have completed the required functions. You must work with one partner. You should work collaboratively. Both partners are responsible for being able to explain the work that's been done to the preceptor. Additionally, work submitted on Blackboard will be graded for correctness. *Only one of the partners should submit the code on Blackboard*. The file `p3.R` should have include the following code, with the NetIDs of the two partners substituted in: ```{r} student.netid.1 <- "netid1" student.netid.2 <- "netid2" assignment <- "precept3" ``` In addition, a comment should contain the full names of the partners. Some of you will be tempted to use `for` and `while`-loops to solve some of the problems below (if you've used those before). Please don't do this -- the goal here is to try to use R the way professional data scientists use it, which usually means no loops. ### Problem 1: Predicting Life Expectancy #### Problem 1(a) (submit on Blackboard) Write a function called `MostAccuratePred` that takes in a dataset in the same format as `gapminder %>% filter(year == 1982)` (i.e., the year is always the same), and finds the 10 countries for which the predictions based on `log(gdpPercap)` are the most accurate. #### Problem 1(b) (not for submission on Blackboard) Review the prediction errors (what is a good systematic way of doing that?). Do you notice any patterns? Discuss your hypotheses about the patterns, if any, with the preceptors. ### Problem 2: Approximating Linear Regression In this problem, you will write a function that finds good coefficients for linear regression. #### Problem 2(a): Finding a good coefficient for linear regression (submit on blackboard) Write a function called `my.lm` that could be used to find the coefficient in Simple Linear Regression. The function could be used like this my.data <- data.frame(X = c(1, 2, 3), Y = c(3.1, 4.9, 7.05)) my.lm(my.data, intercept) # Returns approximately 2, since Y ~ 2*X + 1 The function should work as follows. First, generate possible values for the coefficient using ```{r} seq(-5, 5, 0.1) ``` Then, of those possible coefficients, find the one that produces the smallest error. Try different inputs, and make sure that the answer you are getting is close to the answer you would expect. Explain to your preceptor how you came up with the different inputs. #### Problem 2(b): Finding a good coefficient (challenge, not for submission on Blackboard) Now, write a function that finds both a good intercept and a good coefficient. Hint: use a modification of the function in 2(a) that returns both the coefficient and the sum of squared errors in produces. Then repeatedly use that function for every possible intercept hypothesis. ### Problem 3: Categorical Variables (submit on Blackboard) In class, we ran the following: ```{r} fit <- lm(gdpPercap ~ continent, data = gapminder) fit fit\$coefficients ``` Write a function called `predictGdpCont` that takes in a vector like `fit\$coefficients` and the name of a continent, and returns the prediction for that continent. Inside the function, you may *not* refer to `gapminder` or `lm`. For example, the following should run: fit <- lm(gdpPercap ~ continent, data = gapminder) predictGdpCont(fit\$coefficients, "Asia") # Returns the prediction for Asia Make sure that the numbers you return correspond to what `predict` computes. You may assume the order of the coefficients will always be "(Intercept)" "continentAmericas" "continentAsia" "continentEurope" "continentOceania"