Learning a deep directed
network
etc.
         h2
First learn with all the weights tied
This is exactly equivalent to
learning an RBM
Contrastive divergence learning
is equivalent to ignoring the small
derivatives contributed by the tied
weights between deeper layers.
    v2
         h1
    v1
         h0
         h0
    v0
    v0