Abstract :
This paper studies the minimizing risk problems in Markov decision processes
with countable state space and reward set. The objective is to find a policy which
minimizes the probability risk.that the total discounted rewards do not exceed a
specified value target.. In this sort of model, the decision made by the decision
maker depends not only on system’s states, but also on his target values. By
introducing the decision-maker’s state, we formulate a framework for minimizing
risk models. The policies discussed depend on target values and the rewards may
be arbitrary real numbers. For the finite horizon model, the main results obtained
are: i. The optimal value functions are distribution functions of the target, ii.
there exists an optimal deterministic Markov policy, and iii. a policy is optimal if
and only if at each realizable state it always takes optimal action. In addition, we
obtain a sufficient condition and a necessary condition for the existence of finite
horizon optimal policy independent of targets and we give an algorithm computing
finite horizon optimal policies and optimal value functions. For an infinite horizon
model, we establish the optimality equation and we obtain the structure property of
optimal policy. We prove that the optimal value function is a distribution function
of target and we present a new approximation formula which is the generalization
of the nonnegative rewards cases. An example which illustrates the mistakes of
previous literature shows that the existence of optimal policy has not been proved
really. In this paper, we give an existence condition, which is a sufficient and
necessary condition for the existence of an infinite horizon optimal policy independent
of targets, and we point out that whether there exists an optimal policy
remains an open problem in the general case.