next up previous

Additional axioms.

Axioms specifying probabilities of outcomes use the function symbol prob(n,s), e.g.
tex2html_wrap_inline853

Axioms specifying outcome identification conditions use the predicate tex2html_wrap_inline855, e.g.
tex2html_wrap_inline861

Reward axioms use the function reward(do(a,s)) and assert costs and rewards, e.g. tex2html_wrap_inline869 =7.8

We can also describe time-dependent reward functions: e.g., the reward can be defined as the maximum of a linear function of time with respect to the set of temporal inequalities between actions in the situation term:

reward(do(giveS(Mail,Ray,t),s)) = max (30 - t/10) , with respect to s.

Thus, we have a set of axioms specifying a Markov decision process.