SML201 Intro to Probability

Bernoulli variables

A Bernoulli variable can be thought of as the outcome of a ross of a biased coin. We can get one outcome using

rbinom(n = 1, size = 1, prob = 0.8)

## [1] 1

For Bernoulli variables, size = 1 always. We’ll see situations when size is not in a second.n = 1` means we want one coin toss. If we want the outcomes of 10 coin tosses, we can go

rbinom(n = 10, size = 1, prob = 0.8)

##  [1] 1 1 1 1 1 1 1 1 1 0

Binomial variables

A binomial variable is the the number of times the coin came up heads out of size tosses. For example, here are the outcomes of 100 experiments, where we tossed the coin twice each time:

rbinom(n = 100, size = 2, prob = 0.8)

##   [1] 2 2 2 1 1 1 2 2 0 1 2 1 2 1 2 2 2 1 2 1 2 2 2 1 2 1 2 1 2 2 2 2 2 1 2 2 2
##  [38] 2 2 1 1 2 2 1 1 2 1 1 1 2 2 1 2 1 2 1 2 2 2 2 1 2 1 2 1 1 1 1 2 1 2 2 1 2
##  [75] 2 1 0 2 1 1 2 2 1 2 2 1 1 2 2 2 2 0 2 2 2 2 1 2 1 2

heads: 1, tails :0
rbinom returns the number of 1's
size: # of tosses
n: # of experiments

We can obtain the probability mass function value for a particular value by repeating the same experiment multiple times. For example, suppose we want to know how often the number of times the coin came up heads is 0, when the coin is weighted with P(heads) = 1, and it is tossed twice per experiment.

Here is the proportion of the time that the the coin came up heads 0 times:

mean(rbinom(n = 1000000, size = 2, prob = 0.8) == 0)

## [1] 0.039996

We can obtain this directly using dbinom:

dbinom(x = 0, size = 2, prob = 0.8)

## [1] 0.04

dbinom computes the probability mass function for a particular x (the number of times the coin came up Heads).

We can use dbinom for Bernoulli variables as well:

dbinom(x = 1, size = 1, prob = 0.65)

## [1] 0.65

(Of course – the probability of the coin coming up heads once if the probability of heads is 65% is… 65%)

dbinom(x = 0, size = 1, prob = 0.65)

## [1] 0.35

…and the probability of the coin coming up Tails must be 35%.

We can display the probability mass function by computing the value of dbinom for every possible value.

We use a trick where we can plug in multiple values of x at the same time:

dbinom(x = c(0, 1), size = 1, prob = 0.5)

## [1] 0.5 0.5

We are now ready to display the pmf for a binomial variable with 20 coin tosses and a fair coin:

x <- 0:20
df <- data.frame(x = x, prob = dbinom(x = x, size = 20, prob = 0.5))
ggplot(df) + 
  geom_bar(mapping = aes(x = x, y = prob), stat = "identity")

We can get the same kind of visualization by actually performing the experiment 10,000 times and tallying the number of times that each outcome occurred:

dat <- data.frame(x = rbinom(n = 10000, size = 20, prob = 0.5))
ggplot(dat) + 
  geom_histogram(mapping = aes(x = x, y = ..count../sum(..count..)), binwidth = 1)

Intro to cumulative probability

One way to compute the probability that the number of times the coin comes up heads is between 15 and 20 is to sum up the probabilities that the coin comes up 15, 16, 17, 18, 19, and 20 times:

sum(dbinom(x = c(15, 16, 17, 18, 19, 20), size = 20, prob = 0.5))

## [1] 0.02069473

We can also use cumulative probability for that. pbinom computes the cumulative probability function for binomial variables. For example, the probability that the coin will come up heads up to 3 times is:

pbinom(q = 3, size = 20, prob = 0.5) # Note: we use q, not x

## [1] 0.001288414

We can also do this “manually” using dbinom:

sum(dbinom(x = c(0, 1, 2, 3), size  = 20, prob = 0.5))

## [1] 0.001288414

Now, we can compute the probability that a fair coin will come up heads between 15 and 20 times if there are 20 tosses using pbinom:

pbinom(q = 20, size = 20, prob = 0.5) - pbinom(q = 14, size = 20, prob = 0.5)

## [1] 0.02069473