• DocumentCode
    2717770
  • Title

    Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory

  • Author

    Antos, András ; Szepesvári, Csaba ; Munos, Rémi

  • Author_Institution
    Comput. & Autom. Res. Inst., Hungarian Acad. of Sci., Budapest
  • fYear
    2007
  • fDate
    1-5 April 2007
  • Firstpage
    330
  • Lastpage
    337
  • Abstract
    We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian decision problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian decision problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.
  • Keywords
    Markov processes; continuous systems; iterative methods; learning (artificial intelligence); action-value function; approximate value iteration; batch reinforcement learning; continuous space; discounted-reward Markovian decision problem; policy iteration; single trajectory; Algorithm design and analysis; Automation; Control systems; Dynamic programming; Extraterrestrial measurements; Interleaved codes; Learning; Polynomials; State-space methods; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    1-4244-0706-0
  • Type

    conf

  • DOI
    10.1109/ADPRL.2007.368207
  • Filename
    4220852