Axioms specifying probabilities of outcomes
use
the function symbol prob(n,s), e.g.
Axioms specifying
outcome identification conditions
use
the predicate , e.g.
Reward axioms
use the function
reward(do(a,s)) and assert costs and rewards,
e.g.
=7.8
We can also describe time-dependent reward functions: e.g., the reward can be defined as the maximum of a linear function of time with respect to the set of temporal inequalities between actions in the situation term:
reward(do(giveS(Mail,Ray,t),s)) = max (30 - t/10) , with respect to s.
Thus, we have a set of axioms specifying a Markov decision process.