• DocumentCode
    2717368
  • Title

    The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards

  • Author

    Frazier, Peter ; Powell, Warren

  • Author_Institution
    Dept. of Operations Res. & Financial Eng., Princeton Univ., NJ
  • fYear
    2007
  • fDate
    1-5 April 2007
  • Firstpage
    143
  • Lastpage
    150
  • Abstract
    We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simulations that it performs as well or better than a number of existing learning policies
  • Keywords
    Monte Carlo methods; gradient methods; learning systems; operations research; Monte Carlo simulations; independent normal rewards; knowledge gradient policy; offline learning; Bandwidth; Bayesian methods; Dynamic programming; Knowledge engineering; Learning; Mirrors; Operations research; Performance evaluation; Response surface methodology; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    1-4244-0706-0
  • Type

    conf

  • DOI
    10.1109/ADPRL.2007.368181
  • Filename
    4220826