Introduction to Statistical Inference

Suppose that our goal is to figure out whether a coin is fair. We toss the coin 50 times, and it comes up heads 20 times. Does it mean the coin is unfair?

Not necessarily. It might be that we just got a little unlucky and got 20 instead of 25.

Let’s ask ourselves:

Assuming the coin is fair, what is the probability of obtaining a result that’s as extreme or more extreme than 20?

In other words, how often would we expect 20 heads or fewer, or 30 heads or more, just by chance, even if the coin is fair?

We can compute this using

pbinom(q = 20, size = 50, prob = 0.5) + (1 - pbinom(q = 29, size = 50, prob = 0.5))
## [1] 0.2026388

So, about once every 5 times, we’d expect a result that’s at least as extreme as 20. Perhaps that’s not enough to conclude the coin is not fair.

Let’s visualize this with a histogram:

tosses <- rbinom(n = 10000, size = 50, prob = 0.5)
tosses.df <- data.frame(n = tosses)
ggplot(tosses.df) + geom_bar(mapping = aes(x = n, y = ..prop..)) + geom_vline(xintercept = 20, color = "red" ) + geom_vline(xintercept = 30, color = "red")

We can get the probability of values that are at least as extreme as 20 by adding up the columns outside the red lines.

What if things were the same, but with 500 tosses and 200 heads? Here is the histogram.

tosses <- rbinom(n = 10000, size = 500, prob = 0.5)
tosses.df <- data.frame(n = tosses)
ggplot(tosses.df) + geom_bar(mapping = aes(x = n, y = ..prop..)) + geom_vline(xintercept = 200, color = "red" ) + geom_vline(xintercept = 300, color = "red")

Clearly, the probability if getting 200 heads with 500 tosses and a fair coin is just 0. But we can verify that:

pbinom(q = 200, size = 500, prob = 0.5) + (1 - pbinom(q = 300, size = 500, prob = 0.5))
## [1] 7.395812e-06

We can also confirm this via simulation:

tosses <- rbinom(n = 1000, size = 500, prob = 0.5)
mean(tosses <= 200 | tosses >= 300)
## [1] 0

The p-value

What we just computed is known as the p-value: the probability of getting results at least as extreme as what we observe, if the null hypothesis is true.

In our case, the null hypothesis is that the coin is fair.

Small p-values provide evidence against the Null Hypothesis – a small p-value indicates that what we see is unlikely if the Null Hypothesis were true.

N.B.: the p-value is ABSOLUTELY NOT the probability that the null hypothesis is true.

(So what is the probability that the coin is fair, in our experiment?)

(Glib answer: 0 or 1, depending on whether the coin is fair or not. We’ll discuss this later again.)

The Null Hypothesis

The Null Hypothesis is the hypothesis that nothing interesting is going on – e.g., that the coin is fair, or that the new drug we that we administered to a group of patient is no better than a placebo.

Project 1 variable selection

Suppose you added in a variable, and classification performance on the validation set improved by 0.5% from 65.0% to 65.5%. Is this a genuine improvement?

Size of the validation set: 1000

Null hypothesis: there is no improvement. (The probability of getting an example correct is still 65%)

Model: \(Prob(CCR = x) = dbinom(x = x, size = 1000, prob = .65)\) (assuming the Null Hypothesis is true)

The probability of 655 Heads out of 1000 trials:

dbinom(x = 655, size = 1000, prob = .65)
## [1] 0.02510964

P-value: Prob(x >= 655 or x <= 645) =

pbinom(q = 645, size = 1000, prob = .65) + (1-pbinom(q = 654, size = 1000, prob = .65))
## [1] 0.7654586

(Note: we made a number of simplifying assumptions. In particular, we assumed that we know that the performance of the previous model was exactly 65%, when actually we just had an estimate of the performance.)