DocumentCode :
2297122
Title :
An exemplar test problem on parameter convergence analysis of temporal difference algorithms
Author :
Brown, Martin ; Tutsoy, Onder
Author_Institution :
Control Syst. Group, Univ. of Manchester, Manchester, UK
fYear :
2012
fDate :
6-8 July 2012
Firstpage :
2925
Lastpage :
2930
Abstract :
Reinforcement learning techniques have been developed to solve difficult learning control problems having small amount of a priori knowledge about the system dynamics. In this paper, a simple unstable exemplar test problem is proposed to investigate issues in parametric convergence of the value function. A specific closed-form solution for the value function is determined which has a polynomial form. It is proved that the temporal difference error introduces a null space associated with the finite horizon basis function during the control trajectory. The learning problem can be only nonsingular if the termination is handled correctly, and a number of possible solutions are introduced. This result was only revealed because of the derived closed form solution for the value function.
Keywords :
convergence; infinite horizon; learning (artificial intelligence); learning systems; polynomials; a priori knowledge; closed-form solution; control trajectory; finite horizon basis function; learning control problems; learning problem; null space; parameter convergence analysis; parametric convergence; polynomial form; reinforcement learning techniques; system dynamics; temporal difference algorithms; temporal difference error; unstable exemplar test problem; value function; Algorithm design and analysis; Closed-form solutions; Convergence; Null space; Polynomials; Trajectory; Vectors; Reinforcement learning; polynomial basis functions; rate of convergence; temporal difference learning; value function approximation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Control and Automation (WCICA), 2012 10th World Congress on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-1397-1
Type :
conf
DOI :
10.1109/WCICA.2012.6358370
Filename :
6358370
Link To Document :
بازگشت