replicate

Let’s define a function

square <- function(x){
  return(x**2)
}

We could call the function multiple times:

square(5)
## [1] 25
square(5)
## [1] 25
square(5)
## [1] 25

replicate let’s us automate the process

replicate(3, square(5))
## [1] 25 25 25

A difference is that replicate returns a vector. (Or a matrix.)

Functions don’t need to be involved:

replicate(3, "hi")
## [1] "hi" "hi" "hi"

Generating data

First, let’s introduce the function rbinom. When used with size = 1, rbinom generates a 0 or a 1, with ptobability prob

rbinom(1, size = 1, prob = 0.5)
## [1] 1
rbinom(1, size = 1, prob = 0.5)
## [1] 1
rbinom(1, size = 1, prob = 0.5)
## [1] 0

If we want to generate several coin tosses at once, we can use

rbinom(5, size = 1, prob = 0.5)
## [1] 0 0 0 1 0

Let’s write CoinToss, which would act as a “wrapper” for CoinToss:

CoinToss <- function(ntoss){
  ### Generates random sequence of 1s and 0s ####
  return(rbinom(ntoss, 1, .3))
}

Here is how we would call the function:

CoinToss(5)
## [1] 0 1 1 1 0

Inference

Let’s now write a function that would estimate the probability of a 1 (i.e., the coin coming up heads), given some data. We’ll just take the proportion of the time that the coin came up heads.

b_est <-function(seq){
  return(mean(seq))
}

We can use this like so:

dat <- CoinToss(10)
dat
##  [1] 0 0 1 0 0 0 0 1 1 0
b_est(dat)
## [1] 0.3

Let’s try multiple experiments: first, we’ll generate a bunch of toss sequences:

### list of numbers of tosses ####
n.tosses <- c(5,10,20,50,100,200,300,400,500,600,700,800,900,1000,1200,1500,1600,1700,1800,1900,2000, 4000,6000, 7000,8000,9000, 10000)

exp.res <- sapply(n.tosses, FUN = CoinToss)

exp.res[1]
## [[1]]
## [1] 0 0 1 0 1
exp.res[2]
## [[1]]
##  [1] 0 0 1 0 1 0 0 0 0 0
#....

We can now estimate the “bias” of a coin for each of the sequences:

bias.estimates = sapply(exp.res, FUN = b_est)
bias.estimates
##  [1] 0.4000000 0.2000000 0.3000000 0.3200000 0.3400000 0.2850000 0.2266667
##  [8] 0.2725000 0.3120000 0.2966667 0.3057143 0.2837500 0.2866667 0.3200000
## [15] 0.3091667 0.2980000 0.2868750 0.2994118 0.2994444 0.2884211 0.2965000
## [22] 0.3077500 0.3085000 0.2972857 0.3002500 0.3001111 0.3013000

Let’s plot those:

ggplot(mapping = aes(x=n.tosses, y = bias.estimates)) + 
  geom_line() + theme(text = element_text(size = 15)) + scale_x_log10()

As you can see, for small numbers of tosses, the estimate fluctuates. Eventually, it settles around 0.3 (the precisely correct number) and doesn’t vary much. That’s because for a small set, it’s easy to get a wrong proportion. Imagine just tossing the coin twice: the only possible estimates would be 0, 0.5, and 1. None of these is close to 0.3.

Generating bernouilli data

Here we write the same function as before, but it takes the bias (or weight) of the coin as a parameter

bern_generator <-function(n.tosses, b){
  #this is the same as the "coin toss" function, but takes in a 'bias' parameter p 
  return(rbinom(n.tosses, 1, b))
}

Here, we’ll combine the generation of the data and the estimation of the data

bern_exp_estimator <-function(n.tosses, p){
  outcomes <-rbinom(n.tosses, 1, p) # take this out of the functoin
  return(mean(outcomes))
}

We’ll repeat the same experiment, this time with a probability of 0.7

p.est = sapply(n.tosses, FUN = bern_exp_estimator, .7)

Let’s now also write a function that returns the total number of “heads”

bin_estimator <-function(n.tosses, p){
  #returns the number of "heads" (1s) from a binomi
  outcomes <-rbinom(n.tosses, 1, p )
  return(sum(outcomes))
}

Here is the experiment to visualize the PMF of a Binomial distribution

n.trials = 10000 #number of times we are running our random process (number of trials)
n.tosses = 50 #number of coins we toss for a single random process (# coins for a single trial)

num.heads <- replicate(n.trials,bin_estimator(n.tosses, .5)) #returns the number of heads, replicated n.tosses times

num.heads.df <- data.frame(num.heads = num.heads)
ggplot(num.heads.df) + 
  geom_bar(mapping = aes(x = num.heads, y = ..count..))

ggplot(num.heads.df) + 
  geom_bar(mapping = aes(x = num.heads, y = ..prop..))

Finally, let’s get the CMF:

### get frequency for each entry
freqs <- as.data.frame(table(unlist(num.heads.df)))
### normalize
probs <- freqs$Freq/n.trials

### cumsum function ####
a <- c(1,2,1,3,4)
cumsum(a)
## [1]  1  3  4  7 11
ggplot(freqs, mapping = aes(x=Var1, y = cumsum(Freq/n.trials)))+ geom_point()