etc.
Then freeze the first layer of weights
in both directions and learn the
remaining weights (still tied
together).
This is equivalent to learning
another RBM, using the
aggregated posterior distribution
of h0 as the data.
         h2
    v2
         h1
    v1
    v1
         h0
         h0
    v0