Title :
Convergence of value iterations for total-cost MDPs and POMDPs with general state and action sets
Author :
Feinberg, Eugene A. ; Kasyanov, Pavlo O. ; Zgurovsky, Michael Z.
Author_Institution :
Dept. of Appl. Math. & Stat., Stony Brook Univ., Stony Brook, NY, USA
Abstract :
This paper describes conditions for convergence to optimal values of the dynamic programming algorithm applied to total-cost Markov Decision Processes (MDPSs) with Borel state and action sets and with possibly unbounded one-step cost functions. It also studies applications of these results to Partially Observable MDPs (POMDPs). It is well-known that POMDPs can be reduced to special MDPs, called Completely Observable MDPs (COMDPs), whose state spaces are sets of probabilities of the original states. This paper describes conditions on POMDPs under which optimal policies for COMDPs can be found by value iteration. In other words, this paper provides sufficient conditions for solving total-costs POMDPs with infinite state, observation and action sets by dynamic programming. Examples of applications to filtration, identification, and inventory control are provided.
Keywords :
Markov processes; convergence of numerical methods; decision making; dynamic programming; iterative methods; Borel state; COMDPs; Markov decision processes; POMDPs; action sets; completely observable MDPs; dynamic programming algorithm; general state; infinite state; partially observable MDPs; sufficient condition; total-cost MDPs; unbounded one-step cost functions; value iterations convergence; Convergence; Cost function; Equations; Extraterrestrial measurements; Kernel; Markov processes;
Conference_Titel :
Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on
Conference_Location :
Orlando, FL
DOI :
10.1109/ADPRL.2014.7010613