• Title of article

    An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes

  • Author/Authors

    Bhatnagar، نويسنده , , Shalabh، نويسنده ,

  • Issue Information
    ماهنامه با شماره پیاپی سال 2010
  • Pages
    7
  • From page
    760
  • To page
    766
  • Abstract
    We develop in this article the first actor–critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy.
  • Keywords
    Infinite horizon discounted cost criterion , function approximation , Simultaneous perturbation stochastic approximation , Actor–critic algorithm , Constrained Markov decision processes
  • Journal title
    Systems and Control Letters
  • Serial Year
    2010
  • Journal title
    Systems and Control Letters
  • Record number

    1675604