Problem 1

Suppose 65% of Princeton students like World Coffee better than Hoagie Haven. We selected a random sample of 100 students, and asked them which they prefer. What is the probability that more than 78 students said “World Coffee”?

Use the normal approximation to the Binomial distribution (recall: the mean is \(n\times prob\) and the variance is \(n\times prob\times (1-prob)\)). Verify that you would get the same answer using pbinom as you get using pnorm.

Solution

1 - pnorm(q = 78.9, mean = 65, sd = sqrt(.65*.35*100))
## [1] 0.001782825
1 - pbinom(q = 78, size = 100, prob = .65)
## [1] 0.001686446

Problem 2

You sample the following 10 measurements from \(N(\mu, \sigma^2)\)

set.seed(0)
rnorm(n = 10, mean = 0.1, sd = 2)
##  [1]  2.62590857 -0.55246672  2.75959853  2.64485864  0.92928287
##  [6] -2.97990008 -1.75713407 -0.48944089  0.08846565  4.90930678

Evaluate the evidence against the null hypothesis that \(\mu = 0\).

Now, do the same for rnorm(n = 100000000, mean = 0.1, sd = 2)

What do you observe?

Solution

For \(n = 10\):

set.seed(0)
my.sample <- rnorm(n = 10, mean = 0.1, sd = 2)

pt(-mean(my.sample)/(sd(my.sample)/sqrt(10)), df = 9) + (1 -  pt(mean(my.sample)/(sd(my.sample)/sqrt(10)), df = 9))
## [1] 0.3112661

Problem 3

Write a function that would compute the p-value in a situation analoguous to what we had with the finches – we’d like to compare the means of two samples from normal distributions, and compute the p-value for the null hypothesis that the two means are equal. You may only use rnorm. You cannot use pt or t.test.

You are encouraged to use the lecture as little as possible. Especially if you use the lecture, explain every line of code.

Compare the outputs of your function to the outputs of t.test.

t.pval <- function(sample1, sample2){
  s.mean <- mean(c(sample1, sample2))
  sd1 <- sd(sample1)
  sd2 <- sd(sample2)
  n1 <- length(sample1)
  n2 <- length(sample2)
  act.diff <- mean(sample1) - mean(sample2)
  
  
  return(mean(
        replicate(100000, 
                  abs(mean(rnorm(n = n1, mean = s.mean, sd = sd1)) - 
                      mean(rnorm(n= n2, mean = s.mean, sd = sd2))) >=                               abs(act.diff))))
}

library(Sleuth3)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
finches <- case0201
sample1 <- (finches %>% filter(Year == "1976"))$Depth
sample2 <- (finches %>% filter(Year == "1978"))$Depth
t.pval(sample1, sample2)
## [1] 0
t.pval(sample1+0.3, sample2)
## [1] 0.01158
t.pval(sample1+0.4, sample2)
## [1] 0.06451
t.test(sample1+0.4, sample2)$p.value
## [1] 0.0673325

Problem 4 (Take-home challenge)

The speed of light in vacuum is 299,792,458 m/s. The geographic coordinates of the Great Pyramid of Giza is 29.9792458N, 31.134658E. How weird is that?

Estimate the probability of observing something as weird or weirder for a site of comparable importance to the Great Pyramid, assuming the null hypothesis that aliens did not undertake large-scale architectural projects on Earth.