What the network learns
The six hidden units in the bottleneck connected to the
input representation of person 1 learn to represent
features of people that are useful for predicting the
answer.
Nationality, generation, branch of the family tree.
These features are only useful if the other bottlenecks
use similar representations and the central layer learns
how features predict other features. For example:
Input person is of generation 3  and
relationship requires answer to be one generation up
implies
Output person is of generation 2