Title :
Adaptive sample collection using active learning for kernel-based approximate policy iteration
Author :
Liu, Chunming ; Xu, Xin ; Hu, Haiyun ; Dai, Bin
Author_Institution :
Coll. of Mechatron. & Autom., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Approximate policy iteration (API) has been shown to be a class of reinforcement learning methods with stability and sample efficiency. However, sample collection is still an open problem which is critical to the performance of API methods. In this paper, a novel adaptive sample collection strategy using active learning-based exploration is proposed to enhance the performance of kernel-based API. In this strategy, an online kernel-based least squares policy iteration (KLSPI) method is adopted to construct nonlinear features and approximate the Q-function simultaneously. Therefore, more representative samples can be obtained for value function approximation. Simulation results on typical learning control problems illustrate that by using the proposed strategy, the performance of KLSPI can be improved remarkably.
Keywords :
function approximation; learning (artificial intelligence); least squares approximations; Q-function approximation; active learning-based exploration; adaptive sample collection strategy; kernel-based API; kernel-based approximate policy iteration; online kernel-based least squares policy iteration method; reinforcement learning methods; Algorithm design and analysis; Approximation algorithms; Dictionaries; Function approximation; Kernel; Learning; Least Squares Policy Iteration (LSPI); approximate policy iteration; kernel methods; reinforcement learning; sample collection;
Conference_Titel :
Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on
Conference_Location :
Paris
Print_ISBN :
978-1-4244-9887-1
DOI :
10.1109/ADPRL.2011.5967377