مرکز منطقه ای اطلاع رساني علوم و فناوري - Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition

DocumentCode :

1346332

Title :

Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition

Author :

Xu, Xin ; Liu, Chunming ; Yang, Simon X. ; Hu, Dewen

Author_Institution :

Coll. of Mechatron. & Autom., Nat. Univ. of Defense Technol., Changsha, China

Volume :

Issue :

fYear :

2011

Firstpage :

1863

Lastpage :

1877

Abstract :

In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), e.g., least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to obtain near-optimal policies for Markov decision processes (MDPs) with large or continuous state spaces. To address this problem, this paper presents a hierarchical API (HAPI) method with binary-tree state space decomposition for RL in a class of absorbing MDPs, which can be formulated as time-optimal learning control tasks. In the proposed method, after collecting samples adaptively in the state space of the original MDP, a learning-based decomposition strategy of sample sets was designed to implement the binary-tree state space decomposition process. Then, API algorithms were used on the sample subsets to approximate local optimal policies of sub-MDPs. The original MDP was decomposed into a binary-tree structure of absorbing sub-MDPs, constructed during the learning process, thus, local near-optimal policies were approximated by API algorithms with reduced complexity and higher precision. Furthermore, because of the improved quality of local policies, the combined global policy performed better than the near-optimal policy obtained by a single API algorithm in the original MDP. Three learning control problems, including path-tracking control of a real mobile robot, were studied to evaluate the performance of the HAPI method. With the same setting for basis function selection and sample collection, the proposed HAPI obtained better near-optimal policies than previous API methods such as LSPI and KLSPI.

Keywords :

Markov processes; decision theory; iterative methods; learning (artificial intelligence); mobile robots; path planning; trees (mathematics); Markov decision processes; basis function selection; binary-tree state space decomposition; binary-tree structure; continuous state spaces; hierarchical approximate policy iteration; kernel-based least-squares policy iteration algorithm; learning control problem; learning-based decomposition strategy; local near-optimal policy; mobile robot; path-tracking control; reinforcement learning; time-optimal learning control tasks; Approximation algorithms; Dynamic programming; Function approximation; Learning; Markov processes; Optimal control; Adaptive dynamic programming; Markov decision processes; approximate policy iteration; binary-tree; hierarchical reinforcement learning; time-optimal control; Algorithms; Artificial Intelligence; Computer Simulation; Models, Theoretical; Pattern Recognition, Automated;

fLanguage :

English

Journal_Title :

Neural Networks, IEEE Transactions on

Publisher :

ieee

ISSN :

1045-9227

Type :

jour

DOI :

10.1109/TNN.2011.2168422

Filename :

6041034

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1346332