DocumentCode :
3535888
Title :
Optimality conditions for total-cost Partially Observable Markov Decision Processes
Author :
Feinberg, Eugene A. ; Kasyanov, Pavlo O. ; Zgurovsky, Michael Z.
Author_Institution :
Dept. of Appl. Math. & Stat., Stony Brook Univ., Stony Brook, NY, USA
fYear :
2013
fDate :
10-13 Dec. 2013
Firstpage :
5716
Lastpage :
5721
Abstract :
This note describes sufficient conditions for the existence of optimal policies for Partially Observable Markov Decision Processes (POMDPs). The objective criterion is either minimization of total discounted costs or minimization of total nonnegative costs. It is well-known that a POMDP can be reduced to a Completely Observable Markov Decision Process (COMDP) with the state space being the sets of believe probabilities for the POMDP. Thus, a policy is optimal in POMDP if and only if it corresponds to an optimal policy in the COMDP. Here we provide sufficient conditions for the existence of optimal policies for COMDP and therefore for POMDP. In particular, we consider POMDPs with weakly continuous transition probabilities and bounded below K-infcompact cost functions. For a fully observable MDPs these two conditions guarantee the following three properties: (i) validity of finite-horizon and infinite-horizon optimality equations, (ii) convergence of value iterations to infinite-horizon value functions, (iii) existence of stationary optimal policies. We show that the single additional assumption, that the observation transition probability is continuous in the total variation, implies properties (i)-(iii) for the COMDP. Therefore, this condition also implies the existence of optimal policies for POMDPs. We also provide a more general and less constructive sufficient condition for the validity of (i)-(iii) for the COMDP and therefore for the existence of optimal policies for a POMDP and the possibility of finding them by transforming optimal policies for the corresponding COMDP.
Keywords :
Markov processes; convergence; decision theory; infinite horizon; iterative methods; optimisation; probability; state-space methods; COMDP; K-infcompact cost functions; POMDP; believe probabilities; completely observable Markov decision process; continuous transition probabilities; convergence; infinite-horizon optimality equations; infinite-horizon value functions; observation transition probability; optimal policies; optimal policy; optimality conditions; state space; sufficient conditions; total discounted cost minimization; total nonnegative cost minimization; total-cost partially observable Markov decision processes; value iterations; Convergence; Cost function; Equations; Extraterrestrial measurements; Kernel; Markov processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on
Conference_Location :
Firenze
ISSN :
0743-1546
Print_ISBN :
978-1-4673-5714-2
Type :
conf
DOI :
10.1109/CDC.2013.6760790
Filename :
6760790
Link To Document :
بازگشت