DocumentCode
3661166
Title
A boundedness theoretical analysis for GrADP design: A case study on maze navigation
Author
Zhen Ni;Xiangnan Zhong;Haibo He
Author_Institution
Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, USA 02881
fYear
2015
fDate
7/1/2015 12:00:00 AM
Firstpage
1
Lastpage
8
Abstract
A new theoretical analysis towards the goal representation adaptive dynamic programming (GrADP) design proposed in [1], [2] is investigated in this paper. Unlike the proofs of convergence for adaptive dynamic programming (ADP) in literature, here we provide a new insight for the error bound between the estimated value function and the expected value function. Then we employ the critic network in GrADP approach to approximate the Q value function, and use the action network to provide the control policy. The goal network is adopted to provide the internal reinforcement signal for the critic network over time. Finally, we illustrate that the estimated Q value function is close to the expected value function in an arbitrary small bound on the maze navigation example.
Keywords
"Optimal control","Stability analysis","Convergence"
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), 2015 International Joint Conference on
Electronic_ISBN
2161-4407
Type
conf
DOI
10.1109/IJCNN.2015.7280475
Filename
7280475
Link To Document