• DocumentCode
    3166165
  • Title

    Parametrized Actor-Critic Algorithms for Finite-Horizon MDPs

  • Author

    Abdulla, Mohammed Shahid ; Bhatnagar, Shalabh

  • Author_Institution
    Indian Inst. of Sci., Bangalore
  • fYear
    2007
  • fDate
    9-13 July 2007
  • Firstpage
    534
  • Lastpage
    539
  • Abstract
    Due to their non-stationarity, finite-horizon Markov decision processes (FH-MDPs) have one probability transition matrix per stage. Thus the curse of dimensionality affects FH-MDPs more severely than infinite-horizon MDPs. We propose two parametrized ´actor-critic´ algorithms to compute optimal policies for FH-MDPs. Both algorithms use the two-timescale stochastic approximation technique, thus simultaneously performing gradient search in the parametrized policy space (the ´actor´) on a slower timescale and learning the policy gradient (the ´critic´) via a faster recursion. This is in contrast to methods where critic recursions learn the cost-to-go proper. We show w.p 1 convergence to a set with the necessary condition for constrained optima. The proposed parameterization is for FH-MDPs with compact action sets, although certain exceptions can be handled. Further, a third algorithm for stochastic control of stopping time processes is presented. We explain why current policy evaluation methods do not work as critic to the proposed actor recursion. Simulation results from flow-control in communication networks attest to the performance advantages of all three algorithms.
  • Keywords
    Markov processes; approximation theory; matrix algebra; stochastic systems; Markov decision processes; communication network flow-control; finite-horizon MDP; gradient search; parametrized actor-critic algorithms; parametrized policy space; policy evaluation methods; probability transition matrix; stochastic control; stopping time processes; two-timescale stochastic approximation technique; Approximation algorithms; Automation; Cities and towns; Communication system control; Computational modeling; Computer science; Convergence; Costs; Stochastic processes; Table lookup; Finite horizon Markov decision processes; actor-critic algorithms; reinforcement learning; two timescale stochastic approximation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    American Control Conference, 2007. ACC '07
  • Conference_Location
    New York, NY
  • ISSN
    0743-1619
  • Print_ISBN
    1-4244-0988-8
  • Electronic_ISBN
    0743-1619
  • Type

    conf

  • DOI
    10.1109/ACC.2007.4282587
  • Filename
    4282587