---
title: "Week 7 Lecture 1"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library("ggplot2")
library("dplyr")
```

### `replicate`

Let's define a function

```{r}
square <- function(x){
  return(x**2)
}
```

We could call the function multiple times:

```{r}
square(5)
square(5)
square(5)
```

`replicate` let's us automate the process

```{r}
replicate(3, square(5))
```

A difference is that replicate returns a vector. (Or a matrix.)

Functions don't need to be involved:

```{r}
replicate(3, "hi")
```


### Generating data

First, let's introduce the function `rbinom`. When used with `size = 1`, `rbinom` generates a 0 or a 1, with ptobability `prob`

```{r}
rbinom(1, size = 1, prob = 0.5)
rbinom(1, size = 1, prob = 0.5)
rbinom(1, size = 1, prob = 0.5)
```

If we want to generate several coin tosses at once, we can use

```{r}
rbinom(5, size = 1, prob = 0.5)
```

Let's write `CoinToss`, which would act as a "wrapper" for `CoinToss`:

```{r}
CoinToss <- function(ntoss){
  ### Generates random sequence of 1s and 0s ####
  return(rbinom(ntoss, 1, .3))
}
```

Here is how we would call the function:

```{r}
CoinToss(5)
```

### Inference

Let's now write a function that would estimate the probability of a 1 (i.e., the coin coming up heads), given some data. We'll just take the proportion of the time that the coin came up heads.

```{r}
b_est <-function(seq){
  return(mean(seq))
}
```


We can use this like so:
```{r}
dat <- CoinToss(10)
dat
b_est(dat)
```

Let's try multiple experiments: first, we'll generate a bunch of toss sequences:

```{r}
### list of numbers of tosses ####
n.tosses <- c(5,10,20,50,100,200,300,400,500,600,700,800,900,1000,1200,1500,1600,1700,1800,1900,2000, 4000,6000, 7000,8000,9000, 10000)

exp.res <- sapply(n.tosses, FUN = CoinToss)

exp.res[1]
exp.res[2]
#....
```

We can now estimate the "bias" of a coin for each of the sequences:

```{r}
bias.estimates = sapply(exp.res, FUN = b_est)
bias.estimates
```

Let's plot those:

```{r}
ggplot(mapping = aes(x=n.tosses, y = bias.estimates)) + 
  geom_line() + theme(text = element_text(size = 15)) + scale_x_log10()
```

As you can see, for small numbers of tosses, the estimate fluctuates. Eventually, it settles around 0.3 (the precisely correct number) and doesn't vary much. That's because for a small set, it's easy to get a wrong proportion. Imagine just tossing the coin twice: the only possible estimates would be 0, 0.5, and 1. None of these is close to 0.3.


### Generating bernouilli data

Here we write the same function as before, but it takes the bias (or weight) of the coin as a parameter

```{r}
bern_generator <-function(n.tosses, b){
  #this is the same as the "coin toss" function, but takes in a 'bias' parameter p 
  return(rbinom(n.tosses, 1, b))
}
```


Here, we'll combine the generation of the data and the estimation of the data

```{r}
bern_exp_estimator <-function(n.tosses, p){
  outcomes <-rbinom(n.tosses, 1, p) # take this out of the functoin
  return(mean(outcomes))
}
```


We'll repeat the same experiment, this time with a probability of 0.7

```{r}
p.est = sapply(n.tosses, FUN = bern_exp_estimator, .7)
```


Let's now also write a function that returns the total number of "heads"

```{r}
bin_estimator <-function(n.tosses, p){
  #returns the number of "heads" (1s) from a binomi
  outcomes <-rbinom(n.tosses, 1, p )
  return(sum(outcomes))
}
```

Here is the experiment to visualize the PMF of a Binomial distribution
```{r}
n.trials = 10000 #number of times we are running our random process (number of trials)
n.tosses = 50 #number of coins we toss for a single random process (# coins for a single trial)

num.heads <- replicate(n.trials,bin_estimator(n.tosses, .5)) #returns the number of heads, replicated n.tosses times

num.heads.df <- data.frame(num.heads = num.heads)
ggplot(num.heads.df) + 
  geom_bar(mapping = aes(x = num.heads, y = ..count..))

ggplot(num.heads.df) + 
  geom_bar(mapping = aes(x = num.heads, y = ..prop..))

```

Finally, let's get the CMF:

```{r}
### get frequency for each entry
freqs <- as.data.frame(table(unlist(num.heads.df)))
### normalize
probs <- freqs$Freq/n.trials

### cumsum function ####
a <- c(1,2,1,3,4)
cumsum(a)

ggplot(freqs, mapping = aes(x=Var1, y = cumsum(Freq/n.trials)))+ geom_point()
```