DocumentCode :
2031294
Title :
Reinforcement learning for MDPs using temporal difference schemes
Author :
Thomas, Abraham ; Marcus, Steven I.
Author_Institution :
Dept. of Electr. Eng., Maryland Univ., College Park, MD, USA
Volume :
1
fYear :
1997
fDate :
10-12 Dec 1997
Firstpage :
577
Abstract :
In this paper we propose a reinforcement learning scheme for finding optimal and sub-optimal policies for the finite state Markov decision problem (MDP) with the infinite horizon discounted cost criterion. Online learning is utilized along with temporal difference schemes for approximating value functions to obtain a direct adaptive control scheme for the MDP. The approach features the approximation of stationary deterministic policies with randomized policies. We provide convergence results of the algorithm under very reasonable assumptions, in particular without aperiodicity assumptions
Keywords :
Markov processes; adaptive control; convergence; decision theory; function approximation; learning (artificial intelligence); learning systems; Markov decision problem; convergence; direct adaptive control; function approximation; infinite horizon discounted cost criterion; online learning; reinforcement learning; temporal difference; Adaptive control; Control systems; Cost function; Educational institutions; Infinite horizon; Kernel; Learning; State-space methods; Stochastic processes; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 1997., Proceedings of the 36th IEEE Conference on
Conference_Location :
San Diego, CA
ISSN :
0191-2216
Print_ISBN :
0-7803-4187-2
Type :
conf
DOI :
10.1109/CDC.1997.650692
Filename :
650692
Link To Document :
بازگشت