A directed module also converts its data
distribution into an aggregated posterior
Task 1 is now harder because the
posterior for each training case is non-
Task 2 is performed using an
independent prior. This is a bad
approximation unless the aggregated
posterior is close to factorial.
A directed module attempts to make the
aggregated posterior factorial in one step.
This is too difficult and leads to a bad
compromise. There is no guarantee
that the aggregated posterior is easier
to model than the data distribution.
Task 2
posterior distribution
on hidden units
Task 1
data distribution
on visible units