مرکز منطقه ای اطلاع رساني علوم و فناوري - Direct gradient-based reinforcement learning

DocumentCode :

2226545

Title :

Direct gradient-based reinforcement learning

Author :

Baxter, Jonathan ; Bartlett, Peter L.

Author_Institution :

Res. Sch. of Inf. Sci. & Eng., Australian Nat. Univ., Canberra, ACT, Australia

Volume :

fYear :

2000

fDate :

2000

Firstpage :

271

Abstract :

Many control, scheduling, planning and game-playing tasks can be formulated as reinforcement learning problems, in which an agent chooses actions to take in some environment, aiming to maximize a reward function. We present an algorithm for computing approximations to the gradient of the average reward from a single sample path of a controlled partially observable Markov decision process. We show that the accuracy of these approximations depends on the relationship between a time constant used by the algorithm and the mixing time of the Markov chain, and that the error can be made arbitrarily small by setting the time constant suitably large. We prove that the algorithm converges with probability 1

Keywords :

Markov processes; game theory; gradient methods; learning (artificial intelligence); probability; Markov chain; agent; direct gradient-based reinforcement learning; game-playing tasks; mixing time; partially observable Markov decision process; probability; reward function; sample path; time constant; Adaptive control; Approximation algorithms; Convergence; Discrete event systems; Equations; Learning; Probability distribution; Processor scheduling; State-space methods; Stochastic processes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on

Conference_Location :

Geneva

Print_ISBN :

0-7803-5482-6

Type :

conf

DOI :

10.1109/ISCAS.2000.856049

Filename :

856049

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2226545