Problem 1: replicate

Recall that this is how you can use replicate to repeatdly run the same experiment.

res <- replicate(10, sample(c(1, 2, 3, 4)))
res
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    3    2    1    4    3    2    1    1    4     4
## [2,]    1    4    3    1    2    4    3    4    3     1
## [3,]    2    1    4    3    1    1    4    3    1     3
## [4,]    4    3    2    2    4    3    2    2    2     2

Here, we ran sample(c(1, 2, 3, 4) 10 times. Each column represents a result of an experiment. You will usually just obtain a single number from one experiment. Here is an example:

replicate(10, mean(sample(c(1, 2, 3, 4), size = 2)))
##  [1] 1.5 2.5 1.5 2.0 2.5 1.5 2.5 2.0 1.5 1.5

Repeatedly sample a training set of size 25 from titanic, and create two histograms: one for the performances (i.e., CCRs) on the training set, and one for the performances (i.e., CCRs) on the test set. You should use ggplot’s geom_histogram geom.

Hints and suggestions

The Precept 5 solutions (see the linked videos as well if you like) should be helpful, as should the code from the Tuesday lecture.

Here is a suggestion for how to proceed:

  • First, write a function that samples a small training set, fits a model on it, and computes the performance on the small training set as well as the validation set.

  • Second, use something like sample(1000, func(arg1, arg2)) to get a matrix that’s like the matrix in the Precept 5 solution

  • Observe that you get the same kind of thing as what we got in Precept 5

  • Make histograms (rather than curves, as in Precept 5)

Problem 2

Work on your project. If your project partner is not the same as your precept partner, please work separately.