• DocumentCode
    2831817
  • Title

    Learning optimal values from random walk

  • Author

    Lam, K.P.

  • Author_Institution
    Dept of Syst. Eng. & Eng. Manage., Hong Kong Chinese Univ.
  • fYear
    2005
  • fDate
    16-16 Nov. 2005
  • Lastpage
    339
  • Abstract
    In this paper we extend the random walk example of Sutton and Barto (1998) to a multistage dynamic programming optimization setting with discounted reward. Using Bellman equations on presumed action, the optimal values are derived for general transition probability rho and discount rate gamma, and include the original random walk as a special case. Temporal difference methods with eligibility traces, TD(A), are effective in predicting the optimal values for different rho and gamma; but their performances are found to depend critically on the choice of truncated return in the formulation when gamma is less than 1
  • Keywords
    dynamic programming; learning (artificial intelligence); random processes; Bellman equations; discounted reward; eligibility traces; general transition probability; multistage dynamic programming optimization; optimal value learning; random walk; temporal difference methods; Artificial intelligence; Drugs; Dynamic programming; Equations; Hardware; Learning; Neurodynamics; Research and development management; Systems engineering and theory; Very large scale integration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2005. ICTAI 05. 17th IEEE International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1082-3409
  • Print_ISBN
    0-7695-2488-5
  • Type

    conf

  • DOI
    10.1109/ICTAI.2005.81
  • Filename
    1562957