Fitted Q Iteration with CMACs

Author

Timmer, Stephan ; Riedmiller, Martin

Author_Institution

Dept. of Comput. Sci., Osnabrueck Univ.

fYear

2007

fDate

1-5 April 2007

Firstpage

1

Lastpage

8

Abstract

A major issue in model-free reinforcement learning is how to efficiently exploit the data collected by an exploration strategy. This is especially important in case of continuous, high dimensional state spaces, since it is impossible to explore such spaces exhaustively. A simple but promising approach is to fix the number of state transitions which are sampled from the underlying Markov decision process. For several kernel-based learning algorithms there exist convergence proofs and notable empirical results, if a fixed set of transition instances is used. In this article, we will analyze how function approximators similar to the CMAC-architecture can be combined with this idea. We will show both analytically and empirically the potential power of the CMAC architecture combined with an offline version of Q-learning

Keywords

Markov processes; cerebellar model arithmetic computers; computer architecture; iterative methods; learning (artificial intelligence); CMAC architecture; Markov decision process; Q-learning; fitted Q iteration; function approximators; kernel-based learning; reinforcement learning; Algorithm design and analysis; Computer science; Convergence; Dynamic programming; Inference algorithms; Interleaved codes; Sampling methods; Space exploration; State-space methods; Supervised learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on

Conference_Location

Honolulu, HI

Print_ISBN

1-4244-0706-0

Type

conf

DOI

10.1109/ADPRL.2007.368162

Filename

4220807