مرکز منطقه ای اطلاع رساني علوم و فناوري - A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes

DocumentCode :

954976

Title :

A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes

Author :

Bhatnagar, Shalabh ; Kumar, Shishir

Author_Institution :

Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India

Volume :

Issue :

fYear :

2004

fDate :

4/1/2004 12:00:00 AM

Firstpage :

592

Lastpage :

598

Abstract :

A two-timescale simulation-based actor-critic algorithm for solution of infinite horizon Markov decision processes with finite state and compact action spaces under the discounted cost criterion is proposed. The algorithm does gradient search on the slower timescale in the space of deterministic policies and uses simultaneous perturbation stochastic approximation-based estimates. On the faster scale, the value function corresponding to a given stationary policy is updated and averaged over a fixed number of epochs (for enhanced performance). The proof of convergence to a locally optimal policy is presented. Finally, numerical experiments using the proposed algorithm on flow control in a bottleneck link using a continuous time queueing model are shown.

Keywords :

Markov processes; approximation theory; continuous time systems; convergence; gradient methods; perturbation techniques; queueing theory; search problems; Markov decision processes; actor-critic algorithm; compact action spaces; continuous time queueing model; convergence; discounted cost criterion; finite state; flow control; gradient search; simultaneous perturbation stochastic approximation; Automatic control; Circuits; Control system synthesis; Control systems; Feedback; Modules (abstract algebra); Multidimensional systems; Polynomials; Stochastic processes; System analysis and design;

fLanguage :

English

Journal_Title :

Automatic Control, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9286

Type :

jour

DOI :

10.1109/TAC.2004.825622

Filename :

1284724

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=954976