DocumentCode :
2030031
Title :
Average cost temporal-difference learning
Author :
Tsitsiklis, John N. ; Van Roy, Benjamin
Author_Institution :
Lab. for Inf. & Decision Syst., MIT, Cambridge, MA, USA
Volume :
1
fYear :
1997
fDate :
10-12 Dec 1997
Firstpage :
498
Abstract :
We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the “mixing time” of the Markov chain. The results parallel previous work by the authors (1997), involving approximations of discounted cost-to-go
Keywords :
Markov processes; convergence; decision theory; dynamic programming; learning (artificial intelligence); approximation error; average cost temporal-difference learning; convergence; discounted cost-to-go; fixed basis functions; irreducible aperiodic Markov chain; Approximation error; Convergence; Cost function; Dynamic programming; Heuristic algorithms; Iterative algorithms; Laboratories; Markov processes; Poisson equations; State-space methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 1997., Proceedings of the 36th IEEE Conference on
Conference_Location :
San Diego, CA
ISSN :
0191-2216
Print_ISBN :
0-7803-4187-2
Type :
conf
DOI :
10.1109/CDC.1997.650675
Filename :
650675
Link To Document :
بازگشت