Title :
Average cost temporal-difference learning
Author :
Tsitsiklis, John N. ; Van Roy, Benjamin
Author_Institution :
Lab. for Inf. & Decision Syst., MIT, Cambridge, MA, USA
Abstract :
We describe a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present results concerning convergence and the limit of convergence. We also provide a bound on the resulting approximation error that exhibits an interesting dependence on the “mixing time” of the Markov chain. The results parallel previous work by the authors (1997), involving approximations of discounted cost-to-go
Keywords :
Markov processes; convergence; decision theory; dynamic programming; learning (artificial intelligence); approximation error; average cost temporal-difference learning; convergence; discounted cost-to-go; fixed basis functions; irreducible aperiodic Markov chain; Approximation error; Convergence; Cost function; Dynamic programming; Heuristic algorithms; Iterative algorithms; Laboratories; Markov processes; Poisson equations; State-space methods;
Conference_Titel :
Decision and Control, 1997., Proceedings of the 36th IEEE Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
0-7803-4187-2
DOI :
10.1109/CDC.1997.650675