 
  
  
   
New approach: compute and execute incrementally 
optimal policies on-line using the 
 interpreter.
 interpreter.
 and the residual program p'
  and the residual program p'  
 The cycle of computing an optimal policy and remaining program, executing the first action and getting sensory information (if necessary) repeats until the program completes or execution fails.