Additional axioms.

Axioms specifying probabilities of outcomes use the function symbol prob(n,s), e.g.

Axioms specifying outcome identification conditions use the predicate , e.g.

Reward axioms use the function reward(do(a,s)) and assert costs and rewards, e.g. =7.8

We can also describe time-dependent reward functions: e.g., the reward can be defined as the maximum of a linear function of time with respect to the set of temporal inequalities between actions in the situation term:

reward(do(giveS(Mail,Ray,t),s)) = max (30 - t/10) , with respect to s.

Thus, we have a set of axioms specifying a Markov decision process.