DocumentCode
2717368
Title
The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards
Author
Frazier, Peter ; Powell, Warren
Author_Institution
Dept. of Operations Res. & Financial Eng., Princeton Univ., NJ
fYear
2007
fDate
1-5 April 2007
Firstpage
143
Lastpage
150
Abstract
We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simulations that it performs as well or better than a number of existing learning policies
Keywords
Monte Carlo methods; gradient methods; learning systems; operations research; Monte Carlo simulations; independent normal rewards; knowledge gradient policy; offline learning; Bandwidth; Bayesian methods; Dynamic programming; Knowledge engineering; Learning; Mirrors; Operations research; Performance evaluation; Response surface methodology; Time measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
Conference_Location
Honolulu, HI
Print_ISBN
1-4244-0706-0
Type
conf
DOI
10.1109/ADPRL.2007.368181
Filename
4220826
Link To Document