Precept 1 Problem Set: Solutions

Problem 1: Vectors and Variables

Define the vector 42 43 45 49 501, and store it in a variable called my.vec.

Write code to extract the second and fourth element of the vector. Explain the difference between my.vec and

Solution

First, we’ll define the vector:

my.vec <- c(42, 43, 45, 49, 501)

There are several way to extract the elements we want:

my.vec[c(F, T, F, T, F)] # only works if my.vec is of length 5

my.vec[c(2, 4)]

c(my.vec[2], my.vec[4])

Problem 2(a): Functions

Write a function that takes in a vector and returns a new vector that cotains the second and fourth elements of that vector. Test this function by calling it. Include the calls to the function in the file you are submitting/showing to your preceptor.

Solution

ExtractElem24 <- function(v){
  return(v[c(2, 4)])
}

# Now, let's try using the function

ExtractElem24(c(1, 2, 3, 4, 5, 6))

## [1] 2 4

ExtractElem24(c(10, 8, 6, 4))

## [1] 8 4

Problem 2(b): Functions and Conditionals

Write a function that outputs the solution to the quadratic equation \(ax^2 + bx + c = 0\), given \(a\), \(b\), and \(c\). Test your function when there are two, one, and no solutions.

quad_soln <- function(a, b, c){
  disc <- b**2 - 4*a*c
  if(disc > 0){
    r1 <- (-b + sqrt(disc))/(2*a)
    r2 <- (-b - sqrt(disc))/(2*a)
    return(c(r1, r2))
  }else if(disc ==0){
    r1 <- (-b + sqrt(disc))/(2*a)
    return(r1)
  }else{
    return(c())
  }
}

# Try coefficients where there are two solutions, one solution, and no solutions  

quad_soln(1, -5, 6)

## [1] 3 2

quad_soln(1, -2, 1)

## [1] 1

quad_soln(2, 0, 5)

## NULL

Problem 3: Gapminder

Problem 3(a)

Write a function that computes how many countries in the dataset there are on a given continent. Test this function by querying it with different continent names.

Solution

CountCountries <- function(gapminder, cont){
  n.countries <- gapminder %>% filter(continent == cont) %>%  # Get rid of irrelevant continents
                               select(country) %>%            # We'll be using n_distinct, so we 
                                                              # want rows to be just countries
                               n_distinct
  return(n.countries)
}

CountCountries(gapminder, "Oceania")

## [1] 2

CountCountries(gapminder, "Africa")

## [1] 52

CountCountries(gapminder, "Americas")

## [1] 25

Problem 3(b)

Write a function that takes in a data frame like gapminder, and returns the country with the largest life expectancy on a given continent between the years y1 and y2. Test this function.

Solution

LargestLifeExp <- function(gapminder, cont, y1, y2){
  CountryLargeLE <- gapminder %>% filter(continent == cont) %>% 
                                  filter(year >= y1, year <= y2) %>% 
                                  summarize(country = country[which.max(lifeExp)])
  return(as.character(CountryLargeLE$country)) # Extract the column so we're not returning
                                               # a data frame, and then convert to 
                                               # character
}
                           
# Let's try this

LargestLifeExp(gapminder, "Asia", 1950, 1990)

## [1] "Japan"

LargestLifeExp(gapminder, "Americas", 1950, 1990)

## [1] "Canada"

# Really to actually test the function, we want to be able to know the answer. We can
# do this like this

gapminder.small <- gapminder[1:108, ]

# We can now pretty easily figure out the answer by looking at the now-small table.

LargestLifeExp(gapminder.small, "Asia", 1950, 1990)

## [1] "Bahrain"

Problem 3(c)

Write a function that computes the world population in a given year. Test this function.

Solution

# The idea here is to sum up the populations in a given year

WorldPopulation <- function(gapminder, y){
  total.pop <- gapminder %>% filter(year == y) %>% 
                             summarize(total = sum(as.numeric(pop)))
  return(total.pop$total)
}

WorldPopulation(gapminder, 2007)

## [1] 6251013179

WorldPopulation(gapminder, 1962)

## [1] 2899782974

Problem 4 (Challenge)

Make a new dataframe which contains the increase in life expectancy per year for each country in gapminder. The increase per year is the difference between the life expectancy in the last year and the first year, divided by the number of years.

Solution

# Let's write a function that gets the answer for a subset
# of gapminder that just has one country
IncreaseRate <- function(year, lifeExp){
  y1 <- min(year)
  y2 <- max(year)
  
  LE1 <- lifeExp[which.min(year)]
  LE2 <- lifeExp[which.max(year)]
  
  return( (LE2-LE1)/(y2-y1) )
}

LifeExpIncRate <- function(gapminder){
  ans <- gapminder %>% group_by(country) %>% 
                       summarize(le_inc_rate = IncreaseRate(year, lifeExp))
  return(ans)
}

inc.rates <- LifeExpIncRate(gapminder)

# display the countries with the fastest increase

arrange(inc.rates, desc(le_inc_rate))