---
title: "Week 7 Lecture 1"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library("ggplot2")
library("dplyr")
```
### `replicate`
Let's define a function
```{r}
square <- function(x){
return(x**2)
}
```
We could call the function multiple times:
```{r}
square(5)
square(5)
square(5)
```
`replicate` let's us automate the process
```{r}
replicate(3, square(5))
```
A difference is that replicate returns a vector. (Or a matrix.)
Functions don't need to be involved:
```{r}
replicate(3, "hi")
```
### Generating data
First, let's introduce the function `rbinom`. When used with `size = 1`, `rbinom` generates a 0 or a 1, with ptobability `prob`
```{r}
rbinom(1, size = 1, prob = 0.5)
rbinom(1, size = 1, prob = 0.5)
rbinom(1, size = 1, prob = 0.5)
```
If we want to generate several coin tosses at once, we can use
```{r}
rbinom(5, size = 1, prob = 0.5)
```
Let's write `CoinToss`, which would act as a "wrapper" for `CoinToss`:
```{r}
CoinToss <- function(ntoss){
### Generates random sequence of 1s and 0s ####
return(rbinom(ntoss, 1, .3))
}
```
Here is how we would call the function:
```{r}
CoinToss(5)
```
### Inference
Let's now write a function that would estimate the probability of a 1 (i.e., the coin coming up heads), given some data. We'll just take the proportion of the time that the coin came up heads.
```{r}
b_est <-function(seq){
return(mean(seq))
}
```
We can use this like so:
```{r}
dat <- CoinToss(10)
dat
b_est(dat)
```
Let's try multiple experiments: first, we'll generate a bunch of toss sequences:
```{r}
### list of numbers of tosses ####
n.tosses <- c(5,10,20,50,100,200,300,400,500,600,700,800,900,1000,1200,1500,1600,1700,1800,1900,2000, 4000,6000, 7000,8000,9000, 10000)
exp.res <- sapply(n.tosses, FUN = CoinToss)
exp.res[1]
exp.res[2]
#....
```
We can now estimate the "bias" of a coin for each of the sequences:
```{r}
bias.estimates = sapply(exp.res, FUN = b_est)
bias.estimates
```
Let's plot those:
```{r}
ggplot(mapping = aes(x=n.tosses, y = bias.estimates)) +
geom_line() + theme(text = element_text(size = 15)) + scale_x_log10()
```
As you can see, for small numbers of tosses, the estimate fluctuates. Eventually, it settles around 0.3 (the precisely correct number) and doesn't vary much. That's because for a small set, it's easy to get a wrong proportion. Imagine just tossing the coin twice: the only possible estimates would be 0, 0.5, and 1. None of these is close to 0.3.
### Generating bernouilli data
Here we write the same function as before, but it takes the bias (or weight) of the coin as a parameter
```{r}
bern_generator <-function(n.tosses, b){
#this is the same as the "coin toss" function, but takes in a 'bias' parameter p
return(rbinom(n.tosses, 1, b))
}
```
Here, we'll combine the generation of the data and the estimation of the data
```{r}
bern_exp_estimator <-function(n.tosses, p){
outcomes <-rbinom(n.tosses, 1, p) # take this out of the functoin
return(mean(outcomes))
}
```
We'll repeat the same experiment, this time with a probability of 0.7
```{r}
p.est = sapply(n.tosses, FUN = bern_exp_estimator, .7)
```
Let's now also write a function that returns the total number of "heads"
```{r}
bin_estimator <-function(n.tosses, p){
#returns the number of "heads" (1s) from a binomi
outcomes <-rbinom(n.tosses, 1, p )
return(sum(outcomes))
}
```
Here is the experiment to visualize the PMF of a Binomial distribution
```{r}
n.trials = 10000 #number of times we are running our random process (number of trials)
n.tosses = 50 #number of coins we toss for a single random process (# coins for a single trial)
num.heads <- replicate(n.trials,bin_estimator(n.tosses, .5)) #returns the number of heads, replicated n.tosses times
num.heads.df <- data.frame(num.heads = num.heads)
ggplot(num.heads.df) +
geom_bar(mapping = aes(x = num.heads, y = ..count..))
ggplot(num.heads.df) +
geom_bar(mapping = aes(x = num.heads, y = ..prop..))
```
Finally, let's get the CMF:
```{r}
### get frequency for each entry
freqs <- as.data.frame(table(unlist(num.heads.df)))
### normalize
probs <- freqs$Freq/n.trials
### cumsum function ####
a <- c(1,2,1,3,4)
cumsum(a)
ggplot(freqs, mapping = aes(x=Var1, y = cumsum(Freq/n.trials)))+ geom_point()
```