• DocumentCode
    3166138
  • Title

    Solving MDPs using Two-timescale Simulated Annealing with Multiplicative Weights

  • Author

    Abdulla, Mohammed Shahid ; Bhatnagar, Shalabh

  • Author_Institution
    Indian Inst. of Sci., Bangalore
  • fYear
    2007
  • fDate
    9-13 July 2007
  • Firstpage
    2428
  • Lastpage
    2433
  • Abstract
    We develop extensions of the simulated annealing with multiplicative weights (SAMW) algorithm that proposed a method of solution of finite-horizon Markov decision processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a ´critic´ recursion performs policy evaluation while on the slower timescale an ´actor´ recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.
  • Keywords
    Markov processes; dynamic programming; learning (artificial intelligence); simulated annealing; communication networks; dynamic programming principle; finite-horizon Markov decision processes; flow control; infinite-horizon discounted-reward scenario; multiplicative weights; semiconductor fabrication; two-timescale actor-critic algorithm; two-timescale simulated annealing; Communication system control; Computational modeling; Computer simulation; Convergence; Learning; Materials requirements planning; Performance evaluation; Recursive estimation; Simulated annealing; Stochastic processes; Markov decision processes; Simulated Annealing with Multiplicative Weights; reinforcement learning; two timescale stochastic approximation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    American Control Conference, 2007. ACC '07
  • Conference_Location
    New York, NY
  • ISSN
    0743-1619
  • Print_ISBN
    1-4244-0988-8
  • Electronic_ISBN
    0743-1619
  • Type

    conf

  • DOI
    10.1109/ACC.2007.4282586
  • Filename
    4282586