• DocumentCode
    3388874
  • Title

    Can Reinforcement Learning Always Provide the Best Policy

  • Author

    Duan, Zhansheng ; Chen, Huimin

  • Author_Institution
    Department of Electrical Engineering, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA 70148; College of Electronic and Information Engineering, Xi´´an Jiaotong University.
  • fYear
    2007
  • fDate
    26-29 Aug. 2007
  • Firstpage
    224
  • Lastpage
    228
  • Abstract
    Reinforcement learning deals with how to find the best policy under uncertain environment to maximize some notion of long term reward. In sequential decision making, it is often expected that the best policy can be designed by choosing appropriate reward or penalty for each action. In this paper, we provide a counterexample to show that the best sequential decision rule can not be obtained by the choice of any reward function in the reinforcement learning framework. In fact, the best policy, namely, the randomized sequential probability ratio test, can only be learned via a rather unconventional formulation of the reinforcement learning. The implication to the design of classifier combining method is also discussed.
  • Keywords
    Decision making; Detectors; Lakes; Learning; Probability; Sensor systems; Sequential analysis; Signal processing; Statistics; Target recognition; Reinforcement learning; classifier combining; sequential decision;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Statistical Signal Processing, 2007. SSP '07. IEEE/SP 14th Workshop on
  • Conference_Location
    Madison, WI, USA
  • Print_ISBN
    978-1-4244-1198-6
  • Electronic_ISBN
    978-1-4244-1198-6
  • Type

    conf

  • DOI
    10.1109/SSP.2007.4301252
  • Filename
    4301252