DocumentCode :
1743791
Title :
Stochastic optimization of controlled partially observable Markov decision processes
Author :
Bartlett, Peter L. ; Baxter, Jonathan
Author_Institution :
Res. Sch. of Inf. Sci. & Eng., Australian Nat. Univ., ACT, Australia
Volume :
1
fYear :
2000
fDate :
2000
Firstpage :
124
Abstract :
We introduce an online algorithm for finding local maxima of the average reward in a partially observable Markov decision process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm´s chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter β∈(0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of β is related to the mixing time of the Markov chain induced by the POMDP
Keywords :
Markov processes; convergence of numerical methods; mathematics computing; optimisation; probability; Markov chain; Markov decision process; convergence; probability; stochastic optimization; Approximation algorithms; Convergence; Decision making; Dynamic programming; Dynamic scheduling; Machine learning; Process control; Stochastic processes; USA Councils; Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 2000. Proceedings of the 39th IEEE Conference on
Conference_Location :
Sydney, NSW
ISSN :
0191-2216
Print_ISBN :
0-7803-6638-7
Type :
conf
DOI :
10.1109/CDC.2000.912744
Filename :
912744
Link To Document :
بازگشت