• DocumentCode
    1158303
  • Title

    Punish/Reward: Learning with a Critic in Adaptive Threshold Systems

  • Author

    Widrow, Bernard ; Gupta, Narendra K. ; Maitra, Sidhartha

  • Issue
    5
  • fYear
    1973
  • Firstpage
    455
  • Lastpage
    465
  • Abstract
    An adaptive threshold element is able to "learn" a strategy of play for the game blackjack (twenty-one) with a performance close to that of the Thorp optimal strategy although the adaptive system has no prior knowledge of the game and of the objective of play. After each winning game the decisions of the adaptive system are "rewarded." After each losing game the decisions are "punished." Reward is accomplished by adapting while accepting the actual decision as the desired response. Punishment is accomplished by adapting while taking the desired response to be the opposite of that of the actual decision. This learning scheme is unlike "learning with a teacher" and unlike "unsupervised learning." It involves "bootstrap adaptation" or "learning with a critic." The critic rewards decisions which are members of successful chains of decisions and punishes other decisions. A general analytical model for learning with a critic is formulated and analyzed. The model represents bootstrap learning per se. Although the hypotheses on which the model is based do not perfectly fit blackjack learning, it is applied heuristically to predict adaptation rates with good experimental success. New applications are being explored for bootstrap learning in adaptive controls and multilayered adaptive systems.
  • Keywords
    Adaptive control; Adaptive systems; Analytical models; Control system synthesis; Logic; Network synthesis; Predictive models; Programmable control; Steady-state; Unsupervised learning;
  • fLanguage
    English
  • Journal_Title
    Systems, Man and Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9472
  • Type

    jour

  • DOI
    10.1109/TSMC.1973.4309272
  • Filename
    4309272