مرکز منطقه ای اطلاع رساني علوم و فناوري - Reinforcement learning for MDPs using temporal difference schemes

DocumentCode :

2031294

Title :

Reinforcement learning for MDPs using temporal difference schemes

Author :

Thomas, Abraham ; Marcus, Steven I.

Author_Institution :

Dept. of Electr. Eng., Maryland Univ., College Park, MD, USA

Volume :

fYear :

1997

fDate :

10-12 Dec 1997

Firstpage :

577

Abstract :

In this paper we propose a reinforcement learning scheme for finding optimal and sub-optimal policies for the finite state Markov decision problem (MDP) with the infinite horizon discounted cost criterion. Online learning is utilized along with temporal difference schemes for approximating value functions to obtain a direct adaptive control scheme for the MDP. The approach features the approximation of stationary deterministic policies with randomized policies. We provide convergence results of the algorithm under very reasonable assumptions, in particular without aperiodicity assumptions

Keywords :

Markov processes; adaptive control; convergence; decision theory; function approximation; learning (artificial intelligence); learning systems; Markov decision problem; convergence; direct adaptive control; function approximation; infinite horizon discounted cost criterion; online learning; reinforcement learning; temporal difference; Adaptive control; Control systems; Cost function; Educational institutions; Infinite horizon; Kernel; Learning; State-space methods; Stochastic processes; Topology;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control, 1997., Proceedings of the 36th IEEE Conference on

Conference_Location :

San Diego, CA

ISSN :

0191-2216

Print_ISBN :

0-7803-4187-2

Type :

conf

DOI :

10.1109/CDC.1997.650692

Filename :

650692

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2031294