Darwin’s Finches

Darwin’s Finches exhibit remarkable diversity, partly because the birds were isolated on different islands of the Galapagos archipelago.

Here, we will consider a dataset of the beak depth of a population of Darwin’s Finches in 1976 (before a drought) and in 1978 (after a drought). It was hypothesized that the draught, which influenced the kind of food that was available, would influence the beak depth of the birds who survived the draught.

Peter and Rosemary Grant, of PU, https://en.wikipedia.org/wiki/Peter_and_Rosemary_Grant.

First, let’s look at the data:

library(Sleuth3)
finches <- case0201

Let’s plot the beak depths in 1976 and 1978:

finches$Year <- as.character(finches$Year)
ggplot(finches, aes(x=Depth, fill = Year, y = ..prop..)) + 
    geom_bar(position = "dodge")

ggplot(finches, aes(x=Depth, fill = Year, y = ..count..)) + 
    geom_bar(position = "dodge")

Let’s also compute the means and standard deviations for the population:

finches %>% group_by(Year) %>% summarize(mean = mean(Depth), sd = sd(Depth), count = n())
## # A tibble: 2 x 4
##   Year   mean    sd count
##   <chr> <dbl> <dbl> <int>
## 1 1976   9.47 1.04     89
## 2 1978  10.1  0.906    89

So, are the beaks different or not?

Null hypothesis: the beak depths are not different. The difference we observe is due to chance.

We’ll approximate the distributions of the depths in 1976 and 1978 using normal distributions with the means and sd’s that we computed. Let’s generate fake data that looks like our data.

fake.1976 <- rnorm(n = 89, mean = 9.47, sd = 1.04)
fake.1978 <- rnorm(n = 89, mean = 10.1, sd = 0.906)
depth <- c(fake.1976, fake.1978)
year <- c(rep("1976", 89), rep("1978", 89))
fake.df <- data.frame(Depth = depth, Year = year)
fake.df$Year <- as.character(fake.df$Year)
fake.df %>% group_by(Year) %>% summarize(mean = mean(Depth), sd = sd(Depth), count = n())
## # A tibble: 2 x 4
##   Year   mean    sd count
##   <chr> <dbl> <dbl> <int>
## 1 1976   9.36 0.986    89
## 2 1978  10.1  1.02     89
ggplot(fake.df, aes(x=Depth,  fill = Year))  +
    geom_histogram(position = "dodge", bins = 30)

Now, let’s imagine the means are actually the same. How often would we get very different histrograms by chance?

  fake.1976 <- rnorm(n = 89, mean = mean(finches$Depth), sd = 1.04)
  fake.1978 <- rnorm(n = 89, mean = mean(finches$Depth), sd = 0.906)
  depth <- c(fake.1976, fake.1978)
  year <- c(rep("1976", 89), rep("1978", 89))
  fake.df <- data.frame(Depth = depth, Year = year)
  fake.df$Year <- as.character(fake.df$Year)
  
  stats <- fake.df %>% group_by(Year) %>% summarize(mean = mean(Depth), sd = sd(Depth), count = n())
  stats
## # A tibble: 2 x 4
##   Year   mean    sd count
##   <chr> <dbl> <dbl> <int>
## 1 1976   9.91 1.06     89
## 2 1978   9.80 0.817    89
  ggplot(fake.df, aes(x=Depth,  fill = Year))  +
      geom_histogram(position = "dodge", bins = 30) 

Not that often!

Let’s use numbers now. Our null hypothesis is that the means are actually the same – we’ll estimate them as mean(finches$Depth). The difference we observe is 0.63. Let’s see how often we’ll see a difference this large or larger.

ExpDepthDiff <- function(){
  fake.1976 <- rnorm(n = 89, mean = mean(finches$Depth), sd = 1.04)
  fake.1978 <- rnorm(n = 89, mean = mean(finches$Depth), sd = 0.906)
  depth <- c(fake.1976, fake.1978)
  year <- c(rep("1976", 89), rep("1978", 89))
  fake.df <- data.frame(Depth = depth, Year = year)
  fake.df$Year <- as.character(fake.df$Year)
  
  stats <- fake.df %>% group_by(Year) %>% summarize(mean = mean(Depth), sd = sd(Depth), count = n())
  return(stats$mean[1]-stats$mean[2])
}

diffs <- replicate(10000, ExpDepthDiff())
diffs.df <- data.frame(diff = diffs)
ggplot(diffs.df) + geom_histogram(mapping = aes(x = diff), bins = 30) + geom_vline(xintercept = -0.63, color = "red") + geom_vline(xintercept = 0.63, color = "red")

So, how often do we observe a difference larger than what we observe here?

mean(abs(diffs) > 0.63)
## [1] 0

We have very strong evidence against the null hypothesis that the populations have the same average beak length.