IJCAI 2005 Research Excellence Award Lecture


Why the hidden configurations should be treated
	as data when learning the next layer of weights


•	After learning the first layer of weights:

•	If we freeze the generative weights that define the
	likelihood term and the recognition weights that define
	the distribution over hidden configurations, we get:

•	Maximizing the RHS is equivalent to maximizing the log
	prob of “data” that occurs with probability