--- title: "Week 7 Lecture 1" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library("ggplot2") library("dplyr") ``` ### `replicate` Let's define a function ```{r} square <- function(x){ return(x**2) } ``` We could call the function multiple times: ```{r} square(5) square(5) square(5) ``` `replicate` let's us automate the process ```{r} replicate(3, square(5)) ``` A difference is that replicate returns a vector. (Or a matrix.) Functions don't need to be involved: ```{r} replicate(3, "hi") ``` ### Generating data First, let's introduce the function `rbinom`. When used with `size = 1`, `rbinom` generates a 0 or a 1, with ptobability `prob` ```{r} rbinom(1, size = 1, prob = 0.5) rbinom(1, size = 1, prob = 0.5) rbinom(1, size = 1, prob = 0.5) ``` If we want to generate several coin tosses at once, we can use ```{r} rbinom(5, size = 1, prob = 0.5) ``` Let's write `CoinToss`, which would act as a "wrapper" for `CoinToss`: ```{r} CoinToss <- function(ntoss){ ### Generates random sequence of 1s and 0s #### return(rbinom(ntoss, 1, .3)) } ``` Here is how we would call the function: ```{r} CoinToss(5) ``` ### Inference Let's now write a function that would estimate the probability of a 1 (i.e., the coin coming up heads), given some data. We'll just take the proportion of the time that the coin came up heads. ```{r} b_est <-function(seq){ return(mean(seq)) } ``` We can use this like so: ```{r} dat <- CoinToss(10) dat b_est(dat) ``` Let's try multiple experiments: first, we'll generate a bunch of toss sequences: ```{r} ### list of numbers of tosses #### n.tosses <- c(5,10,20,50,100,200,300,400,500,600,700,800,900,1000,1200,1500,1600,1700,1800,1900,2000, 4000,6000, 7000,8000,9000, 10000) exp.res <- sapply(n.tosses, FUN = CoinToss) exp.res[1] exp.res[2] #.... ``` We can now estimate the "bias" of a coin for each of the sequences: ```{r} bias.estimates = sapply(exp.res, FUN = b_est) bias.estimates ``` Let's plot those: ```{r} ggplot(mapping = aes(x=n.tosses, y = bias.estimates)) + geom_line() + theme(text = element_text(size = 15)) + scale_x_log10() ``` As you can see, for small numbers of tosses, the estimate fluctuates. Eventually, it settles around 0.3 (the precisely correct number) and doesn't vary much. That's because for a small set, it's easy to get a wrong proportion. Imagine just tossing the coin twice: the only possible estimates would be 0, 0.5, and 1. None of these is close to 0.3. ### Generating bernouilli data Here we write the same function as before, but it takes the bias (or weight) of the coin as a parameter ```{r} bern_generator <-function(n.tosses, b){ #this is the same as the "coin toss" function, but takes in a 'bias' parameter p return(rbinom(n.tosses, 1, b)) } ``` Here, we'll combine the generation of the data and the estimation of the data ```{r} bern_exp_estimator <-function(n.tosses, p){ outcomes <-rbinom(n.tosses, 1, p) # take this out of the functoin return(mean(outcomes)) } ``` We'll repeat the same experiment, this time with a probability of 0.7 ```{r} p.est = sapply(n.tosses, FUN = bern_exp_estimator, .7) ``` Let's now also write a function that returns the total number of "heads" ```{r} bin_estimator <-function(n.tosses, p){ #returns the number of "heads" (1s) from a binomi outcomes <-rbinom(n.tosses, 1, p ) return(sum(outcomes)) } ``` Here is the experiment to visualize the PMF of a Binomial distribution ```{r} n.trials = 10000 #number of times we are running our random process (number of trials) n.tosses = 50 #number of coins we toss for a single random process (# coins for a single trial) num.heads <- replicate(n.trials,bin_estimator(n.tosses, .5)) #returns the number of heads, replicated n.tosses times num.heads.df <- data.frame(num.heads = num.heads) ggplot(num.heads.df) + geom_bar(mapping = aes(x = num.heads, y = ..count..)) ggplot(num.heads.df) + geom_bar(mapping = aes(x = num.heads, y = ..prop..)) ``` Finally, let's get the CMF: ```{r} ### get frequency for each entry freqs <- as.data.frame(table(unlist(num.heads.df))) ### normalize probs <- freqs$Freq/n.trials ### cumsum function #### a <- c(1,2,1,3,4) cumsum(a) ggplot(freqs, mapping = aes(x=Var1, y = cumsum(Freq/n.trials)))+ geom_point() ```