Histograms

titanic <- read.csv("http://guerzhoy.princeton.edu/201s20/titanic.csv")

A more typical situation is plotting the histogram of a continous variable like age.

ggplot(data = titanic,  mapping = aes(x = Age)) +
      geom_histogram(bins = 10)

Varying the number of bins allows us to display the data more appropriately: too many bins means we’ll see patterns that aren only there because the sample size is too small; too few bins means we won’t see trends that are actually in the data.

ggplot(data = titanic,  mapping = aes(x = Age)) +
      geom_histogram(bins = 100)

ggplot(data = titanic,  mapping = aes(x = Age)) +
      geom_histogram(bins = 3)

We can display overlapping histograms. We specify alpha = 0.4 to indicate that the histograms are partially transparent.

ggplot(data = titanic, mapping = aes(x = Age, fill = Sex)) +
  geom_histogram(alpha = 0.4, bins = 10, position = "identity")

Note that the default position is "stack".

ggplot(data = titanic, mapping = aes(x = Age, fill = Sex)) +
  geom_histogram(alpha = 0.4, bins = 10, position = "stack")

Here is the same histogram with position "dodge"

ggplot(data = titanic, mapping = aes(x = Age, fill = Sex)) +
  geom_histogram(alpha = 0.4, bins = 10, position = "dodge")