مرکز منطقه ای اطلاع رساني علوم و فناوري - Stochastic optimization of controlled partially observable Markov decision processes

DocumentCode :

1743791

Title :

Stochastic optimization of controlled partially observable Markov decision processes

Author :

Bartlett, Peter L. ; Baxter, Jonathan

Author_Institution :

Res. Sch. of Inf. Sci. & Eng., Australian Nat. Univ., ACT, Australia

Volume :

fYear :

2000

fDate :

2000

Firstpage :

124

Abstract :

We introduce an online algorithm for finding local maxima of the average reward in a partially observable Markov decision process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm´s chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter β∈(0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of β is related to the mixing time of the Markov chain induced by the POMDP

Keywords :

Markov processes; convergence of numerical methods; mathematics computing; optimisation; probability; Markov chain; Markov decision process; convergence; probability; stochastic optimization; Approximation algorithms; Convergence; Decision making; Dynamic programming; Dynamic scheduling; Machine learning; Process control; Stochastic processes; USA Councils; Uncertainty;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control, 2000. Proceedings of the 39th IEEE Conference on

Conference_Location :

Sydney, NSW

ISSN :

0191-2216

Print_ISBN :

0-7803-6638-7

Type :

conf

DOI :

10.1109/CDC.2000.912744

Filename :

912744

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1743791