• DocumentCode
    1364167
  • Title

    A Grey Synthesis Approach to Efficient Architecture Design for Temporal Difference Learning

  • Author

    Hwang, Kao-Shing ; Lo, Chia-Yue ; Lee, Guan-Yuan

  • Author_Institution
    Nat. Chung-Cheng Univ., Chiayi, Taiwan
  • Volume
    16
  • Issue
    6
  • fYear
    2011
  • Firstpage
    1136
  • Lastpage
    1144
  • Abstract
    Temporal difference (TD) constitutes a class of methods for learning predictions in multistep prediction problems. The most important application of these methods is to temporal credit assignment in reinforcement learning. Although these TD procedures work in theory and in principle, its success is contingent on proper selection of parametric values. As well, its learning is majorly based on repeated exposures, which may not always be practical or feasible. This paper examines the issues of the efficient and general implementation of TD for hardware implementation of reinforcement learning algorithms by synthesizing the series of discounted sum of rewards along time. The proposed algorithm eliminates all step size parameters and improves data efficiency based on a synthetic approach of Grey theory. This paper also presents the stability of the proposed algorithm from the viewpoint of Grey theory. The algorithm along with a critic-actor reinforcement learning model is implemented in a System-on-a-Programmable-Chip (SOPC) board. In addition to comparing with the renowned model, adaptive heuristic critic (AHC), the results of experiments demonstrate that the proposed control mechanism can learn to control a system with very little a priori knowledge. Meanwhile, the effect of uncertainty in interactions between the system and the environment can be relaxed to some extent in the learning process of the proposed reinforcement learning agent.
  • Keywords
    grey systems; learning (artificial intelligence); programmable circuits; system-on-chip; temporal reasoning; architecture design; critic-actor reinforcement learning; grey synthesis approach; intelligent control; multistep prediction problems; system on a programmable chip; temporal credit assignment; temporal difference learning; Adaptive systems; Algorithm design and analysis; Artificial neural networks; Intelligent control; Learning; Mathematical model; Predictive models; Grey theory; intelligent control; reinforcement learning; temporal difference;
  • fLanguage
    English
  • Journal_Title
    Mechatronics, IEEE/ASME Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4435
  • Type

    jour

  • DOI
    10.1109/TMECH.2010.2082558
  • Filename
    5613184