Multiresolution state-space discretization method for Q-learning with function approximation and policy iteration

Author

Lampton, Amanda ; Valasek, John

Author_Institution

Dept. of Aerosp. Eng., Texas A&M Univ., College Station, TX, USA

fYear

2009

fDate

11-14 Oct. 2009

Firstpage

2677

Lastpage

2682

Abstract

A multiresolution state-space discretization method is developed for the episodic unsupervised learning method of Q-learning. In addition, a genetic algorithm is used periodically during learning to approximate the action-value function. Policy iteration is added as a stopping criterion for the algorithm. For large scale problems Q-learning often suffers from the curse of dimensionality due to large numbers of possible state-action pairs. This paper develops a method whereby a state-space is adaptively discretized by progressively finer grids around the areas of interest within the state or learning space. Policy iteration is added to prevent unnecessary episodes at each level of discretization once the learning has converged. Utility of the method is demonstrated with application to the problem of a morphing airfoil with two morphing parameters (two state variables). By setting the multiresolution method to define the area of interest by the goal the agent seeks, it is shown that this method can learn a specific goal within Â±0.002, while reducing the total number episodes needed to converge by 85% from the allotted total possible episodes. It is also shown that a good approximation of the action-value function is produced with 80% agreement between the tabulated and approximated policy, though empirically the approximated policy appears to be superior.

Keywords

discrete systems; function approximation; genetic algorithms; intelligent robots; iterative methods; learning systems; optimal control; state-space methods; unsupervised learning; Q-learning; action-value function approximation; agent goal; dimensionality curse; episodic unsupervised learning method; genetic algorithm; intelligent robot; morphing airfoil problem; multiresolution state-space discretization method; optimal control policy iteration; state-action pair; state-space method; stopping criterion; Automotive components; Convergence; Cybernetics; Function approximation; Genetic algorithms; Large-scale systems; Orbital robotics; Space vehicles; USA Councils; Unsupervised learning; Function Approximation; Genetic Algorithm; Multiresolution; Policy Iteration; Q-learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on

Conference_Location

San Antonio, TX

ISSN

1062-922X

Print_ISBN

978-1-4244-2793-2

Electronic_ISBN

1062-922X

Type

conf

DOI

10.1109/ICSMC.2009.5346129

Filename

5346129