مرکز منطقه ای اطلاع رساني علوم و فناوري - Average cost temporal-difference learning

DocumentCode :

2030031

Title :

Average cost temporal-difference learning

Author :

Tsitsiklis, John N. ; Van Roy, Benjamin

Author_Institution :

Lab. for Inf. & Decision Syst., MIT, Cambridge, MA, USA

Volume :

fYear :

1997

fDate :

10-12 Dec 1997

Firstpage :

498

Abstract :

We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the “mixing time” of the Markov chain. The results parallel previous work by the authors (1997), involving approximations of discounted cost-to-go

Keywords :

Markov processes; convergence; decision theory; dynamic programming; learning (artificial intelligence); approximation error; average cost temporal-difference learning; convergence; discounted cost-to-go; fixed basis functions; irreducible aperiodic Markov chain; Approximation error; Convergence; Cost function; Dynamic programming; Heuristic algorithms; Iterative algorithms; Laboratories; Markov processes; Poisson equations; State-space methods;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control, 1997., Proceedings of the 36th IEEE Conference on

Conference_Location :

San Diego, CA

ISSN :

0191-2216

Print_ISBN :

0-7803-4187-2

Type :

conf

DOI :

10.1109/CDC.1997.650675

Filename :

650675

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2030031