DocumentCode
1743791
Title
Stochastic optimization of controlled partially observable Markov decision processes
Author
Bartlett, Peter L. ; Baxter, Jonathan
Author_Institution
Res. Sch. of Inf. Sci. & Eng., Australian Nat. Univ., ACT, Australia
Volume
1
fYear
2000
fDate
2000
Firstpage
124
Abstract
We introduce an online algorithm for finding local maxima of the average reward in a partially observable Markov decision process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm´s chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter β∈(0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of β is related to the mixing time of the Markov chain induced by the POMDP
Keywords
Markov processes; convergence of numerical methods; mathematics computing; optimisation; probability; Markov chain; Markov decision process; convergence; probability; stochastic optimization; Approximation algorithms; Convergence; Decision making; Dynamic programming; Dynamic scheduling; Machine learning; Process control; Stochastic processes; USA Councils; Uncertainty;
fLanguage
English
Publisher
ieee
Conference_Titel
Decision and Control, 2000. Proceedings of the 39th IEEE Conference on
Conference_Location
Sydney, NSW
ISSN
0191-2216
Print_ISBN
0-7803-6638-7
Type
conf
DOI
10.1109/CDC.2000.912744
Filename
912744
Link To Document