--- title: "Precept 6 Problem Set" output: html_document: df_print: paged --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ### Problem 1: `replicate` Recall that this is how you can use `replicate` to repeatdly run the same experiment. ```{r} res <- replicate(10, sample(c(1, 2, 3, 4))) res ``` Here, we ran `sample(c(1, 2, 3, 4)` 10 times. Each column represents a result of an experiment. You will usually just obtain a single number from one experiment. Here is an example: ```{r} replicate(10, mean(sample(c(1, 2, 3, 4), size = 2))) ``` Repeatedly sample a training set of size 25 from `titanic`, and create two histograms: one for the performances (i.e., CCRs) on the training set, and one for the performances (i.e., CCRs) on the test set. You should use `ggplot`'s `geom_histogram` geom. #### Hints and suggestions The [Precept 5 solutions](http://guerzhoy.mycpanel.princeton.edu/201s19/pre/P5/p5_soln.html) (see the linked [videos](http://guerzhoy.mycpanel.princeton.edu/201s19/index.html#assignments) as well if you like) should be helpful, as should [the code](http://guerzhoy.mycpanel.princeton.edu/201s19/lec/W07/lec1/probintro.R) from the Tuesday lecture. Here is a suggestion for how to proceed: * First, write a function that samples a small training set, fits a model on it, and computes the performance on the small training set as well as the validation set. * Second, use something like `sample(1000, func(arg1, arg2))` to get a matrix that's like the matrix in the Precept 5 solution * Observe that you get the same kind of thing as what we got in Precept 5 * Make histograms (rather than curves, as in Precept 5) ### Problem 2 Work on your project. If your project partner is not the same as your precept partner, please work separately.