A Grey Synthesis Approach to Efficient Architecture Design for Temporal Difference Learning

Author

Hwang, Kao-Shing ; Lo, Chia-Yue ; Lee, Guan-Yuan

Author_Institution

Nat. Chung-Cheng Univ., Chiayi, Taiwan

Volume

16

Issue

6

fYear

2011

Firstpage

1136

Lastpage

1144

Abstract

Temporal difference (TD) constitutes a class of methods for learning predictions in multistep prediction problems. The most important application of these methods is to temporal credit assignment in reinforcement learning. Although these TD procedures work in theory and in principle, its success is contingent on proper selection of parametric values. As well, its learning is majorly based on repeated exposures, which may not always be practical or feasible. This paper examines the issues of the efficient and general implementation of TD for hardware implementation of reinforcement learning algorithms by synthesizing the series of discounted sum of rewards along time. The proposed algorithm eliminates all step size parameters and improves data efficiency based on a synthetic approach of Grey theory. This paper also presents the stability of the proposed algorithm from the viewpoint of Grey theory. The algorithm along with a critic-actor reinforcement learning model is implemented in a System-on-a-Programmable-Chip (SOPC) board. In addition to comparing with the renowned model, adaptive heuristic critic (AHC), the results of experiments demonstrate that the proposed control mechanism can learn to control a system with very little a priori knowledge. Meanwhile, the effect of uncertainty in interactions between the system and the environment can be relaxed to some extent in the learning process of the proposed reinforcement learning agent.

Keywords

grey systems; learning (artificial intelligence); programmable circuits; system-on-chip; temporal reasoning; architecture design; critic-actor reinforcement learning; grey synthesis approach; intelligent control; multistep prediction problems; system on a programmable chip; temporal credit assignment; temporal difference learning; Adaptive systems; Algorithm design and analysis; Artificial neural networks; Intelligent control; Learning; Mathematical model; Predictive models; Grey theory; intelligent control; reinforcement learning; temporal difference;

fLanguage

English

Journal_Title

Mechatronics, IEEE/ASME Transactions on

Publisher

ieee

ISSN

1083-4435

Type

jour

DOI

10.1109/TMECH.2010.2082558

Filename

5613184