Title :
The policy iteration algorithm for average reward Markov decision processes with general state space
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
fDate :
12/1/1997 12:00:00 AM
Abstract :
The average cost optimal control problem is addressed for Markov decision processes with unbounded cost. It is found that the policy iteration algorithm generates a sequence of policies which are c-regular, where c is the cost function under consideration. This result only requires the existence of an initial c-regular policy and an irreducibility condition on the state space. Furthermore, under these conditions the sequence of relative value functions generated by the algorithm is bounded from below and “nearly” decreasing, from which it follows that the algorithm is always convergent. Under further conditions, it is shown that the algorithm does compute a solution to the optimality equations and hence an optimal average cost policy. These results provide elementary criteria for the existence of optimal policies for Markov decision processes with unbounded cost and recover known results for the standard linear-quadratic-Gaussian problem. In particular, in the control of multiclass queueing networks, it is found that there is a close connection between optimization of the network and optimal control of a far simpler fluid network model
Keywords :
Markov processes; cost optimal control; decision theory; iterative methods; operations research; queueing theory; state-space methods; Howard algorithm; Markov decision processes; Poisson equation; average reward; linear-quadratic-Gaussian; multiclass queueing networks; optimal average cost policy; policy iteration algorithm; state space; Algorithm design and analysis; Cost function; Difference equations; Helium; History; Optimal control; Poisson equations; Stability; State-space methods;
Journal_Title :
Automatic Control, IEEE Transactions on