Title :
Reinforcement learning for MDPs using temporal difference schemes
Author :
Thomas, Abraham ; Marcus, Steven I.
Author_Institution :
Dept. of Electr. Eng., Maryland Univ., College Park, MD, USA
Abstract :
In this paper we propose a reinforcement learning scheme for finding optimal and sub-optimal policies for the finite state Markov decision problem (MDP) with the infinite horizon discounted cost criterion. Online learning is utilized along with temporal difference schemes for approximating value functions to obtain a direct adaptive control scheme for the MDP. The approach features the approximation of stationary deterministic policies with randomized policies. We provide convergence results of the algorithm under very reasonable assumptions, in particular without aperiodicity assumptions
Keywords :
Markov processes; adaptive control; convergence; decision theory; function approximation; learning (artificial intelligence); learning systems; Markov decision problem; convergence; direct adaptive control; function approximation; infinite horizon discounted cost criterion; online learning; reinforcement learning; temporal difference; Adaptive control; Control systems; Cost function; Educational institutions; Infinite horizon; Kernel; Learning; State-space methods; Stochastic processes; Topology;
Conference_Titel :
Decision and Control, 1997., Proceedings of the 36th IEEE Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
0-7803-4187-2
DOI :
10.1109/CDC.1997.650692