DocumentCode :
2717368
Title :
The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards
Author :
Frazier, Peter ; Powell, Warren
Author_Institution :
Dept. of Operations Res. & Financial Eng., Princeton Univ., NJ
fYear :
2007
fDate :
1-5 April 2007
Firstpage :
143
Lastpage :
150
Abstract :
We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simulations that it performs as well or better than a number of existing learning policies
Keywords :
Monte Carlo methods; gradient methods; learning systems; operations research; Monte Carlo simulations; independent normal rewards; knowledge gradient policy; offline learning; Bandwidth; Bayesian methods; Dynamic programming; Knowledge engineering; Learning; Mirrors; Operations research; Performance evaluation; Response surface methodology; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0706-0
Type :
conf
DOI :
10.1109/ADPRL.2007.368181
Filename :
4220826
Link To Document :
بازگشت