Probability and coin tosses, again

Let’s review coin tosses. Here, we’ll generate \(100\) coin tosses where the weight of the coin is 0.7 (i.e., we expect the coin to come up Heads (1) 7 times out of 10).

tosses <- rbinom(n = 100, size = 1, prob = 0.7)
tosses
##   [1] 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1 0 1
##  [36] 1 0 0 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 0
##  [71] 1 0 1 1 0 0 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1

We can comfirm that the coin indeed came up heads about 70% of the time:

mean(tosses)
## [1] 0.68

What is the probability mass function here?

Tails Heads
0.3 0.7

The binomial distribution

Let’s now try something different: we’ll toss the coin 10 times at a time, and record how many times it came up Heads. Then we’ll repeat that number 100 times.

The result will be 100 records of the number of heads out of 10 tosses:

rbinom(n = 100, size = 10, prob = 0.7)
##   [1]  8  9  5  7  5  8  8  9  8  7  9  8  7  6  6  6  6  9  6  8  6  9  8
##  [24]  6  5  8  5  6  6  6  8  5  6  8  9  8  6  9  5  7  8  5  7  7  8  5
##  [47]  8  8  9  6  8  6  7  8  9  6  5  7  6  5  6  6  5  6  9  6  9  7  6
##  [70]  6  7  7 10  8  8  5  9  6  7  6  6  7  8  8  7  9 10  6 10  3  8  8
##  [93]  6  4  5  8  8  7  8  7

Let’s now make a histogram out of this. We’ll use 50000 tosses so that the histogram looks “smoother”

set.seed(0)
exp10 <- rbinom(n = 500000, size = 10, prob = 0.7)
exp10.df <- data.frame(n.heads = exp10)
ggplot(exp10.df) + geom_bar(mapping = aes(x=n.heads, y = ..prop..)) + scale_x_continuous(breaks = 1:10)

We can take this histogram as representing (approximately) the probability mass function of the Binomial distribution with probability \(0.70\) and 10 trials.

In other words, here are the probabilities of obtaining \(k\) heads for each \(k\) between \(0\) and \(10\):

binom.pmf.df <- data.frame(n.heads = 0:10,
                  binom.pmf = dbinom(x = 0:10, size = 10, prob = 0.7))
binom.pmf.df
##    n.heads    binom.pmf
## 1        0 0.0000059049
## 2        1 0.0001377810
## 3        2 0.0014467005
## 4        3 0.0090016920
## 5        4 0.0367569090
## 6        5 0.1029193452
## 7        6 0.2001209490
## 8        7 0.2668279320
## 9        8 0.2334744405
## 10       9 0.1210608210
## 11      10 0.0282475249

We didn’t need to look the numbers up on the histogram (and the histogram was an approximation anyway). R can compute the PMF using a built-in formula. But it’s nice that those match up!

Let’s look more closely at dbinom for a second. dbinom computes the probability that you’ll get a particular number of heads with size trials:

dbinom(x = 2, size = 10, prob = 0.7)
## [1] 0.001446701

You can give pbinom several inputs by sending a vector as the x

dbinom(x = c(2, 3), size = 10, prob = 0.7)
## [1] 0.001446701 0.009001692

Here, we have 11 possible outcomes. All probabilities will always add up to 1. Think of it this way: the probability of 0 or 1 or 2 or … or 10 heads is 1 (since one of those is sure to happen). To get that probability, we could add up the probabilities of the individual events. In other words:

\(P(\text{nH = 0 or nH = 1 or ... or nH = 10}) = P(\text{nH = 0}) + P(\text{nH = 1}) + ... + P(\text{nH = 10}) = 1\).

Here’s another application of this idea. What is the probability of observing a coin coming up heads once or more?

We have \[P(\text{nHeads} \geq 1) + P(\text{nHeads} = 0) = 1\]

since one of those events will definitely happen and the events are mutually exclusive. So we have

\[P(\text{nHeads} \geq 1) = 1 - P(\text{nHeads} = 0)\]

and

1 - dbinom(x = 0, size = 10, prob = 0.7)
## [1] 0.9999941

We can use the same idea to compute the probability that the coin will come up heads more than once:

\(\text{P(nH = 2)} + ... + \text{P(nH = 10)} = 1 - \text{P(nH = 0)} - \text{P(nH = 1)}\)

1 - dbinom(x = 0, size = 10, prob = 0.7) - dbinom(x = 1, size = 10, prob = 0.7)
## [1] 0.9998563

Cumulative mass functions

The cumulative mass function of a distribution is the probability that the outcome will be \(q\) or less. You can compute the cumulative mass function for the binomial distribution using pbinom.

For example, here is the probability of the coin’s coming up heads at most once:

pbinom(q = 1, size = 10, prob = 0.7)
## [1] 0.0001436859

We can check that this works the same way with dbinom:

dbinom(x = 0, size = 10, prob = 0.7) + dbinom(x = 1, size = 10, prob = 0.7)
## [1] 0.0001436859

We indeed got the same number.

We could compute the cumulative mass function for the Bernoulli distribution too – it would just be kind of boring! (Why?)

The Guassian distribution

Like we saw in the previous lecture, a Binomial distribution is “bell-shaped” if the number of trials size is large:

set.seed(0)
exp10 <- rbinom(n = 50000, size = 1000, prob = 0.7)
exp10.df <- data.frame(n.heads = exp10)
ggplot(exp10.df) + geom_bar(mapping = aes(x=n.heads, y = ..prop..)) + scale_x_continuous(limits = c(620, 780))

This is the shape of a Gaussian (or Normal) distribution. Specifically, it’s (approximately) the shape of a normal distribution with mean \(700\) and standard deviation \(14.9\). (We’ll see how to figure out those numbers later)

Here is how we can generate data from the normal distribution:

rnorm(n = 1, mean = 700, sd = 14.9)
## [1] 705.1339
rnorm(n = 1, mean = 700, sd = 14.9)
## [1] 713.5617
rnorm(n = 1, mean = 700, sd = 14.9)
## [1] 719.4835
rnorm(n = 1, mean = 700, sd = 14.9)
## [1] 672.043

As you can see, we are now generating real numbers rather than just integers. The Gaussian distribution is a continuous distribution – that means the random process generates real numbers.

This means that the probability of generating any number in particular is 0 – there are a lot of real numbers. (But we can still compute something that’s like a PMF that will tell as how likely a number is to be generated relative to another number – the function that does that is the probability density function, or PDF).

Let’s generate (almost) the same histogram as before, but using the Gaussian random number generator:

exp.norm <- rnorm(n = 50000, mean = 700, sd = 14.9)
exp.norm.df <- data.frame(r = round(exp.norm))
ggplot(exp.norm.df) + geom_bar(mapping = aes(x=r, y = ..prop..)) +  scale_x_continuous(limits = c(620, 780))

We will now use the cumulative probability density function for the normal distribution to verify that the probability of getting a number smaller than 700 is 0.5:

pnorm(q = 700, mean = 700, sd = 14.9)
## [1] 0.5

The same is (approximately) true about the Binomial distribution, which looks very similar:

pbinom(q = 700, size = 1000, prob = 0.7)
## [1] 0.5119275

We can verify that the distrbutions are similar by checking the cumulative distribution function values elsewhere:

pnorm(q = 680, mean = 700, sd = 14.9)
## [1] 0.08975231
pbinom(q = 680, size = 1000, prob = 0.7)
## [1] 0.08978433

Finally, we can check that this is actually true in our simulations:

mean(exp10 <= 680)
## [1] 0.0882
mean(exp.norm <= 680)
## [1] 0.0884

Pretty close!

More on the Gaussian distribution

The Guassian distribution (or Bell Curve) is defined by its mean – the average of the random numbers it generates – and its standard deviation. The standard deviation is the measure of how “spread out” the numbers are.

The standard deviation of a set of numbers \(x\) is \[sd(x_1, x_2, x_3, ..., x_n) = \sqrt{\frac{ (x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + ... + (x_n - \bar{x})^2}{n}}.\]

Here, \(\bar{x} = \frac{x_1 + x_2 + ... + x_n}{n}\) is the mean of the x’s.

The standard deviation is the square root of the average squared distance from a datapoint to the mean. If we just measured the average distance to the mean (instead of the square root of the average squared distance), we’d get a similar quantity that’s easier to understand.

In R, we can use sd to compute the standard deviation:

sd(c(1, 2, 10, 2, 1))    # small
## [1] 3.834058
sd(c(100, 2, 10, 2, 1))  # large
## [1] 43.19722

Let’s generate some numbers from a Gaussian distribution:

nums <- rnorm(n = 200, mean = 20, sd = 15)

We can now estimate the mean and standard deviation:

mean(nums)
## [1] 20.64939
sd(nums)
## [1] 15.7887

(Actually, \(\sqrt{\frac{n}{n-1}} sd(nums)\) is a better estimate for the standard deviation. We won’t get into detail here, but the issue is that we don’t assume we know the mean, so we estimate the mean using the mean of the 200 numbers, which won’t be exact. So if we just compute the sd, we’d be systematically underestimating the true standard deviation).

We can estimate many distributions using the Gaussian distribution. You already saw this in action – we can estimate the Binomial distribution using the Gaussian distribution.

set.seed(0)
exp10 <- rbinom(n = 50000, size = 1000, prob = 0.7)
exp10.df <- data.frame(n.heads = exp10)
mean(exp10)
## [1] 700.05
sd(exp10)*sqrt(50000/49999)
## [1] 14.50686

The Gaussian distribution with mean 700 and standard deviation 14.5 would be a good approximation here, empirically.

Approximating the Binomial Distrbution using the Guassian distribution

For \(size\) (number of tosses) large enough (say over 50) and \(prob\) reasonably close to 0.5, we can estimate the binomial distribution using the normal distribution with mean \(size\times prob\) and standard deviation \(\sqrt{size \times prob \times (1-prob)}\).

This is one application of the Central Limit Theorem (we will not get into this much further.)