DocumentCode
2717050
Title
Fitted Q Iteration with CMACs
Author
Timmer, Stephan ; Riedmiller, Martin
Author_Institution
Dept. of Comput. Sci., Osnabrueck Univ.
fYear
2007
fDate
1-5 April 2007
Firstpage
1
Lastpage
8
Abstract
A major issue in model-free reinforcement learning is how to efficiently exploit the data collected by an exploration strategy. This is especially important in case of continuous, high dimensional state spaces, since it is impossible to explore such spaces exhaustively. A simple but promising approach is to fix the number of state transitions which are sampled from the underlying Markov decision process. For several kernel-based learning algorithms there exist convergence proofs and notable empirical results, if a fixed set of transition instances is used. In this article, we will analyze how function approximators similar to the CMAC-architecture can be combined with this idea. We will show both analytically and empirically the potential power of the CMAC architecture combined with an offline version of Q-learning
Keywords
Markov processes; cerebellar model arithmetic computers; computer architecture; iterative methods; learning (artificial intelligence); CMAC architecture; Markov decision process; Q-learning; fitted Q iteration; function approximators; kernel-based learning; reinforcement learning; Algorithm design and analysis; Computer science; Convergence; Dynamic programming; Inference algorithms; Interleaved codes; Sampling methods; Space exploration; State-space methods; Supervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
Conference_Location
Honolulu, HI
Print_ISBN
1-4244-0706-0
Type
conf
DOI
10.1109/ADPRL.2007.368162
Filename
4220807
Link To Document