There is nothing to submit on Monday this week. You will be graded on making reasonable effort toward completing the assignment in precept.
Make a function that takes in a data frame in the same format as Titanic, and returns the percentage (i.e., a number between 0 and 100) of people who did not survive
You can use strsplit
to split character strings into words. For example, the following splits a character string into words assuming that the words are separated by a space
words <- strsplit("Go Tigers", " ")[[1]]
words
## [1] "Go" "Tigers"
words[1]
## [1] "Go"
words[2]
## [1] "Tigers"
Write a function that takes in the name of a person (as it appears in the Titanic dataset) and returns the persons last name
sapply
Add a column to the titanic
dataset that includes the last name of the person. You should use sapply
and your function from Problem 2
Use logistic regression to predict survival using the last name of a person. Are you able to obtain a better accuracy than the baseline classifier? Compute and compare the false positive rate (FPR), false negative rate (FNR), and the positive predictive value (PPV).
The definitions are as follows (note: there was a think-o in lecture, follow the definitions below)
\[FPR = \frac{\text{# of times the model said "positive" and was wrong}}{\text{# of negatives }}\]
\[FNR = \frac{\text{# of times the model said "negative" and was wrong}}{\text{# of positives }}\]
\[PPV = \frac{\text{# of times the model said "positive" and was correct}}{\text{# of times the model said "positive"}}\]
Can you come up with a theory that would explain why you can predict survival using the last name?
ggplot
Use ggplot
to visualize the relationship between the fare and the class, as well as the sex of the passenger. Do you see any patterns?
ggplot
Use ggplot
to visualize the relationship between the age of a passenger and their probability of survival, based on a model the uses the passenger’s age, sex, class, and fare, as well as based on a model that uses just the passenger’s age. Superimpose the two plots.
Write a function that will compute the total profit (or loss) if the if the insurance agent uses a logistic regression model that takes that uses the sex, class, and age of a person to predict whether they will survive, for the population of people on the titanic (note: this is “cheating,” since in reality an agent would not have access to the data to fit their model; there are also other issues here, which we will discuss). The policy the insurance agent uses the following procedure:
p
, turn them away.p
, sell them insurance for premium
.benefit
to the estate.Find reasonable p
, premium
, and benefit
which would yield a profit in the case of the Titanic.
(Again, note: this is an exercise, and not a realistic example. most liners did not sink; no insurer would sell insurance if they knew the liner would sink. In fact, insurers often refuse to sell insurance when the probability of a bad outcome is not very small.)