DocumentCode :
2226545
Title :
Direct gradient-based reinforcement learning
Author :
Baxter, Jonathan ; Bartlett, Peter L.
Author_Institution :
Res. Sch. of Inf. Sci. & Eng., Australian Nat. Univ., Canberra, ACT, Australia
Volume :
3
fYear :
2000
fDate :
2000
Firstpage :
271
Abstract :
Many control, scheduling, planning and game-playing tasks can be formulated as reinforcement learning problems, in which an agent chooses actions to take in some environment, aiming to maximize a reward function. We present an algorithm for computing approximations to the gradient of the average reward from a single sample path of a controlled partially observable Markov decision process. We show that the accuracy of these approximations depends on the relationship between a time constant used by the algorithm and the mixing time of the Markov chain, and that the error can be made arbitrarily small by setting the time constant suitably large. We prove that the algorithm converges with probability 1
Keywords :
Markov processes; game theory; gradient methods; learning (artificial intelligence); probability; Markov chain; agent; direct gradient-based reinforcement learning; game-playing tasks; mixing time; partially observable Markov decision process; probability; reward function; sample path; time constant; Adaptive control; Approximation algorithms; Convergence; Discrete event systems; Equations; Learning; Probability distribution; Processor scheduling; State-space methods; Stochastic processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on
Conference_Location :
Geneva
Print_ISBN :
0-7803-5482-6
Type :
conf
DOI :
10.1109/ISCAS.2000.856049
Filename :
856049
Link To Document :
بازگشت