A Bernoulli variable can be thought of as the outcome of a ross of a biased coin. We can get one outcome using
rbinom(n = 1, size = 1, prob = 0.8)
## [1] 1
For Bernoulli variables, size = 1
always. We’ll see situations when size
is not in a second.
n = 1` means we want one coin toss. If we want the outcomes of 10 coin tosses, we can go
rbinom(n = 10, size = 1, prob = 0.8)
## [1] 1 1 1 1 1 1 1 1 1 0
A binomial variable is the the number of times the coin came up heads out of size
tosses. For example, here are the outcomes of 100 experiments, where we tossed the coin twice each time:
rbinom(n = 100, size = 2, prob = 0.8)
## [1] 2 2 2 1 1 1 2 2 0 1 2 1 2 1 2 2 2 1 2 1 2 2 2 1 2 1 2 1 2 2 2 2 2 1 2 2 2
## [38] 2 2 1 1 2 2 1 1 2 1 1 1 2 2 1 2 1 2 1 2 2 2 2 1 2 1 2 1 1 1 1 2 1 2 2 1 2
## [75] 2 1 0 2 1 1 2 2 1 2 2 1 1 2 2 2 2 0 2 2 2 2 1 2 1 2
heads: 1, tails :0
rbinom returns the number of 1's
size: # of tosses
n: # of experiments
We can obtain the probability mass function value for a particular value by repeating the same experiment multiple times. For example, suppose we want to know how often the number of times the coin came up heads is 0, when the coin is weighted with P(heads) = 1, and it is tossed twice per experiment.
Here is the proportion of the time that the the coin came up heads 0 times:
mean(rbinom(n = 1000000, size = 2, prob = 0.8) == 0)
## [1] 0.039996
We can obtain this directly using dbinom
:
dbinom(x = 0, size = 2, prob = 0.8)
## [1] 0.04
dbinom
computes the probability mass function for a particular x
(the number of times the coin came up Heads).
We can use dbinom
for Bernoulli variables as well:
dbinom(x = 1, size = 1, prob = 0.65)
## [1] 0.65
(Of course – the probability of the coin coming up heads once if the probability of heads is 65% is… 65%)
dbinom(x = 0, size = 1, prob = 0.65)
## [1] 0.35
…and the probability of the coin coming up Tails must be 35%.
We can display the probability mass function by computing the value of dbinom
for every possible value.
We use a trick where we can plug in multiple values of x
at the same time:
dbinom(x = c(0, 1), size = 1, prob = 0.5)
## [1] 0.5 0.5
We are now ready to display the pmf for a binomial variable with 20 coin tosses and a fair coin:
x <- 0:20
df <- data.frame(x = x, prob = dbinom(x = x, size = 20, prob = 0.5))
ggplot(df) +
geom_bar(mapping = aes(x = x, y = prob), stat = "identity")
We can get the same kind of visualization by actually performing the experiment 10,000 times and tallying the number of times that each outcome occurred:
dat <- data.frame(x = rbinom(n = 10000, size = 20, prob = 0.5))
ggplot(dat) +
geom_histogram(mapping = aes(x = x, y = ..count../sum(..count..)), binwidth = 1)
One way to compute the probability that the number of times the coin comes up heads is between 15 and 20 is to sum up the probabilities that the coin comes up 15, 16, 17, 18, 19, and 20 times:
sum(dbinom(x = c(15, 16, 17, 18, 19, 20), size = 20, prob = 0.5))
## [1] 0.02069473
We can also use cumulative probability for that. pbinom
computes the cumulative probability function for binomial variables. For example, the probability that the coin will come up heads up to 3 times is:
pbinom(q = 3, size = 20, prob = 0.5) # Note: we use q, not x
## [1] 0.001288414
We can also do this “manually” using dbinom
:
sum(dbinom(x = c(0, 1, 2, 3), size = 20, prob = 0.5))
## [1] 0.001288414
Now, we can compute the probability that a fair coin will come up heads between 15 and 20 times if there are 20 tosses using pbinom
:
pbinom(q = 20, size = 20, prob = 0.5) - pbinom(q = 14, size = 20, prob = 0.5)
## [1] 0.02069473