Can Reinforcement Learning Always Provide the Best Policy

Author

Duan, Zhansheng ; Chen, Huimin

Author_Institution

Department of Electrical Engineering, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA 70148; College of Electronic and Information Engineering, Xi´´an Jiaotong University.

fYear

2007

fDate

26-29 Aug. 2007

Firstpage

224

Lastpage

228

Abstract

Reinforcement learning deals with how to find the best policy under uncertain environment to maximize some notion of long term reward. In sequential decision making, it is often expected that the best policy can be designed by choosing appropriate reward or penalty for each action. In this paper, we provide a counterexample to show that the best sequential decision rule can not be obtained by the choice of any reward function in the reinforcement learning framework. In fact, the best policy, namely, the randomized sequential probability ratio test, can only be learned via a rather unconventional formulation of the reinforcement learning. The implication to the design of classifier combining method is also discussed.

Keywords

Decision making; Detectors; Lakes; Learning; Probability; Sensor systems; Sequential analysis; Signal processing; Statistics; Target recognition; Reinforcement learning; classifier combining; sequential decision;

fLanguage

English

Publisher

ieee

Conference_Titel

Statistical Signal Processing, 2007. SSP '07. IEEE/SP 14th Workshop on

Conference_Location

Madison, WI, USA

Print_ISBN

978-1-4244-1198-6

Electronic_ISBN

978-1-4244-1198-6

Type

conf

DOI

10.1109/SSP.2007.4301252

Filename

4301252