• DocumentCode
    1743791
  • Title

    Stochastic optimization of controlled partially observable Markov decision processes

  • Author

    Bartlett, Peter L. ; Baxter, Jonathan

  • Author_Institution
    Res. Sch. of Inf. Sci. & Eng., Australian Nat. Univ., ACT, Australia
  • Volume
    1
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    124
  • Abstract
    We introduce an online algorithm for finding local maxima of the average reward in a partially observable Markov decision process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm´s chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter β∈(0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of β is related to the mixing time of the Markov chain induced by the POMDP
  • Keywords
    Markov processes; convergence of numerical methods; mathematics computing; optimisation; probability; Markov chain; Markov decision process; convergence; probability; stochastic optimization; Approximation algorithms; Convergence; Decision making; Dynamic programming; Dynamic scheduling; Machine learning; Process control; Stochastic processes; USA Councils; Uncertainty;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control, 2000. Proceedings of the 39th IEEE Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    0191-2216
  • Print_ISBN
    0-7803-6638-7
  • Type

    conf

  • DOI
    10.1109/CDC.2000.912744
  • Filename
    912744