replicate
Let’s define a function
square <- function(x){
return(x**2)
}
We could call the function multiple times:
square(5)
## [1] 25
square(5)
## [1] 25
square(5)
## [1] 25
replicate
let’s us automate the process
replicate(3, square(5))
## [1] 25 25 25
A difference is that replicate returns a vector. (Or a matrix.)
Functions don’t need to be involved:
replicate(3, "hi")
## [1] "hi" "hi" "hi"
First, let’s introduce the function rbinom
. When used with size = 1
, rbinom
generates a 0 or a 1, with ptobability prob
rbinom(1, size = 1, prob = 0.5)
## [1] 1
rbinom(1, size = 1, prob = 0.5)
## [1] 1
rbinom(1, size = 1, prob = 0.5)
## [1] 0
If we want to generate several coin tosses at once, we can use
rbinom(5, size = 1, prob = 0.5)
## [1] 0 0 0 1 0
Let’s write CoinToss
, which would act as a “wrapper” for CoinToss
:
CoinToss <- function(ntoss){
### Generates random sequence of 1s and 0s ####
return(rbinom(ntoss, 1, .3))
}
Here is how we would call the function:
CoinToss(5)
## [1] 0 1 1 1 0
Let’s now write a function that would estimate the probability of a 1 (i.e., the coin coming up heads), given some data. We’ll just take the proportion of the time that the coin came up heads.
b_est <-function(seq){
return(mean(seq))
}
We can use this like so:
dat <- CoinToss(10)
dat
## [1] 0 0 1 0 0 0 0 1 1 0
b_est(dat)
## [1] 0.3
Let’s try multiple experiments: first, we’ll generate a bunch of toss sequences:
### list of numbers of tosses ####
n.tosses <- c(5,10,20,50,100,200,300,400,500,600,700,800,900,1000,1200,1500,1600,1700,1800,1900,2000, 4000,6000, 7000,8000,9000, 10000)
exp.res <- sapply(n.tosses, FUN = CoinToss)
exp.res[1]
## [[1]]
## [1] 0 0 1 0 1
exp.res[2]
## [[1]]
## [1] 0 0 1 0 1 0 0 0 0 0
#....
We can now estimate the “bias” of a coin for each of the sequences:
bias.estimates = sapply(exp.res, FUN = b_est)
bias.estimates
## [1] 0.4000000 0.2000000 0.3000000 0.3200000 0.3400000 0.2850000 0.2266667
## [8] 0.2725000 0.3120000 0.2966667 0.3057143 0.2837500 0.2866667 0.3200000
## [15] 0.3091667 0.2980000 0.2868750 0.2994118 0.2994444 0.2884211 0.2965000
## [22] 0.3077500 0.3085000 0.2972857 0.3002500 0.3001111 0.3013000
Let’s plot those:
ggplot(mapping = aes(x=n.tosses, y = bias.estimates)) +
geom_line() + theme(text = element_text(size = 15)) + scale_x_log10()
As you can see, for small numbers of tosses, the estimate fluctuates. Eventually, it settles around 0.3 (the precisely correct number) and doesn’t vary much. That’s because for a small set, it’s easy to get a wrong proportion. Imagine just tossing the coin twice: the only possible estimates would be 0, 0.5, and 1. None of these is close to 0.3.
Here we write the same function as before, but it takes the bias (or weight) of the coin as a parameter
bern_generator <-function(n.tosses, b){
#this is the same as the "coin toss" function, but takes in a 'bias' parameter p
return(rbinom(n.tosses, 1, b))
}
Here, we’ll combine the generation of the data and the estimation of the data
bern_exp_estimator <-function(n.tosses, p){
outcomes <-rbinom(n.tosses, 1, p) # take this out of the functoin
return(mean(outcomes))
}
We’ll repeat the same experiment, this time with a probability of 0.7
p.est = sapply(n.tosses, FUN = bern_exp_estimator, .7)
Let’s now also write a function that returns the total number of “heads”
bin_estimator <-function(n.tosses, p){
#returns the number of "heads" (1s) from a binomi
outcomes <-rbinom(n.tosses, 1, p )
return(sum(outcomes))
}
Here is the experiment to visualize the PMF of a Binomial distribution
n.trials = 10000 #number of times we are running our random process (number of trials)
n.tosses = 50 #number of coins we toss for a single random process (# coins for a single trial)
num.heads <- replicate(n.trials,bin_estimator(n.tosses, .5)) #returns the number of heads, replicated n.tosses times
num.heads.df <- data.frame(num.heads = num.heads)
ggplot(num.heads.df) +
geom_bar(mapping = aes(x = num.heads, y = ..count..))
ggplot(num.heads.df) +
geom_bar(mapping = aes(x = num.heads, y = ..prop..))
Finally, let’s get the CMF:
### get frequency for each entry
freqs <- as.data.frame(table(unlist(num.heads.df)))
### normalize
probs <- freqs$Freq/n.trials
### cumsum function ####
a <- c(1,2,1,3,4)
cumsum(a)
## [1] 1 3 4 7 11
ggplot(freqs, mapping = aes(x=Var1, y = cumsum(Freq/n.trials)))+ geom_point()