Title of article

An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes

Author/Authors

Bhatnagar، نويسنده , , Shalabh، نويسنده ,

Issue Information

ماهنامه با شماره پیاپی سال 2010

Pages

From page

760

To page

766

Abstract

We develop in this article the first actor–critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy.

Keywords

Infinite horizon discounted cost criterion , function approximation , Simultaneous perturbation stochastic approximation , Actor–critic algorithm , Constrained Markov decision processes

Journal title

Systems and Control Letters

Serial Year

2010

Journal title

Systems and Control Letters

Record number

1675604

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=1675604