The decision problem faced by an agent is that of forming an
optimal policy that maximizes expected total
accumulated reward.
An important observation: There is no need to consider many `unnatural' policies when we search for an optimal policy.
Hence, we need a situation calculus based programming language Golog to provide natural constraints on the search.
The control structures of Golog can be used to guide the search for an optimal policy. The nondeterministic choice constructs indicate where to do the search.