Title :
The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards
Author :
Frazier, Peter ; Powell, Warren
Author_Institution :
Dept. of Operations Res. & Financial Eng., Princeton Univ., NJ
Abstract :
We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simulations that it performs as well or better than a number of existing learning policies
Keywords :
Monte Carlo methods; gradient methods; learning systems; operations research; Monte Carlo simulations; independent normal rewards; knowledge gradient policy; offline learning; Bandwidth; Bayesian methods; Dynamic programming; Knowledge engineering; Learning; Mirrors; Operations research; Performance evaluation; Response surface methodology; Time measurement;
Conference_Titel :
Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0706-0
DOI :
10.1109/ADPRL.2007.368181