titanic <- read.csv("http://guerzhoy.princeton.edu/201s20/titanic.csv")

Predicting “Survived” or “Died”

Like we mentioned before, we can decide to guess “Survived” if the probability is greater or equal to 0.5.

fit <- glm(Survived ~ Age + Sex + Pclass, data = titanic, family = "binomial")
titanic[, "pred"] <- predict(fit, newdata = titanic, type="response") >= .5

(Note that this is the same as predicting “Survived” if the sum \(a_0 + a_1 x_ 1 + ...\) is greater than 0, since \(\sigma(0) = 0.5\).)

The “baseline” classifier

The simplest possible classifier simply predicts the same thing (in our case, “did not survive”) every time. We can compute how often the classifier will be correct:

mean(titanic$pred == titanic$Survived)
## [1] 0.794814

And now we can compute how often the baseline classifier will be correct

mean(titanic$Survived == 0)
## [1] 0.6144307

Other measures of how good a classifier/predictor/model is

False positive rate

The false positive rate is the rate at which the model outputs “positive”, when considering the negative examples (i.e., the model says “Survived” when the person did not survive):

\[FPR = \frac{\text{# of times the model said "positive" and was wrong}}{\text{# of negatives }}\]

sum(titanic$pred == 1 & titanic$Survived == 0)/sum(titanic$Survived == 0)
## [1] 0.1504587

False negative rate

The false nevative rate is the rate at which the model outputs “negative”, when considering the positive examples (i.e., the model says “Did not survive” when the person actually did survive)

\[FNR = \frac{\text{# of times the model said "negative" and was wrong}}{\text{# of positives }}\]

sum(titanic$pred == 0 & titanic$Survived == 1)/sum(titanic$Survived == 1)
## [1] 0.2923977

Positive predictive value

The positive predictive value is the rate at which the model is correct when it says “positive”:

\[PPV = \frac{\text{# of times the model said "positive" and was correct}}{\text{# of times the model said "positive"}}\]

Which is “positive” and which is “negative”?

There is no hard-and-fast rule, but usually “positive” would be the suprising or rare or important event (e.g., the patient has a rare/significant disease).