The cost function for a low-dimensional
representation
For points where pij is large and qij is small we lose a lot.
Nearby points in high-D really want to be nearby in
low-D
For points where qij is large and pij is small we lose a
little because we waste some of the probability mass in
the Q distribution.
Widely separated points in high-D have a mild
preference for not being too close in low-D.
But it doesn’t cost much to make the manifold bend
back on itself.