• DocumentCode
    3433937
  • Title

    TD-learning with exploration

  • Author

    Meyn, Sean P. ; Surana, Amit

  • Author_Institution
    Department of Electrical and Computer Engineering and the Coordinated Science Laboratory at UIUC, USA
  • fYear
    2011
  • fDate
    12-15 Dec. 2011
  • Firstpage
    148
  • Lastpage
    155
  • Abstract
    We introduce exploration in the TD-learning algorithm to approximate the value function for a given policy. In this way we can modify the norm used for approximation, “zooming in” to a region of interest in the state space. We also provide extensions to SARSA to eliminate the need for numerical integration in policy improvement. Construction of the algorithm and its analysis build on recent general results concerning the spectral theory of Markov chains and positive operators.
  • Keywords
    Approximation algorithms; Equations; Function approximation; Linear approximation; Markov processes; Mathematical model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on
  • Conference_Location
    Orlando, FL, USA
  • ISSN
    0743-1546
  • Print_ISBN
    978-1-61284-800-6
  • Electronic_ISBN
    0743-1546
  • Type

    conf

  • DOI
    10.1109/CDC.2011.6160851
  • Filename
    6160851