What the network learns
It learns four distinct pattern of activity for the 3 hidden
units. These patterns correspond to the nodes in the
finite state automaton.
Do not confuse units in a neural network with nodes
in a finite state automaton. Nodes are like activity
vectors.
The automaton is restricted to be in exactly one state
at each time. The hidden units are restricted to have
exactly one vector of activity at each time.
 A recurrent network can emulate a finite state
automaton, but it is exponentially more powerful. With N
hidden neurons it has 2^N possible binary activity
vectors in the hidden units.
This is important when the input stream has several
separate things going on at once. A finite state
automaton cannot cope with this properly.