• DocumentCode
    2419764
  • Title

    Optimism in reinforcement learning and Kullback-Leibler divergence

  • Author

    Filippi, Sarah ; Cappé, Olivier ; Garivier, Aurélien

  • Author_Institution
    LTCI, TELECOM ParisTech, Paris, France
  • fYear
    2010
  • fDate
    Sept. 29 2010-Oct. 1 2010
  • Firstpage
    115
  • Lastpage
    122
  • Abstract
    We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value iterations under a constraint of consistency with the estimated model transition probabilities. The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows this strategy, has recently been shown to guarantee near-optimal regret bounds. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By studying the linear maximization problem under KL constraints, we provide an efficient algorithm, termed KL-UCRL, for solving KL-optimistic extended value iteration. Using recent deviation bounds on the KL divergence, we prove that KL-UCRL provides the same guarantees as UCRL2 in terms of regret. However, numerical experiments on classical benchmarks show a significantly improved behavior, particularly when the MDP has reduced connectivity. To support this observation, we provide elements of comparison between the two algorithms based on geometric considerations.
  • Keywords
    Markov processes; learning (artificial intelligence); Kullback-Leibler divergence; finite Markov decision process; linear maximization problem; model transition probability; model-based reinforcement learning; optimism; optimistic strategy; Algorithm design and analysis; Benchmark testing; Context modeling; Equations; Learning; Markov processes; Mathematical model; Kullback-Leibler divergence; Markov decision processes; Model-based approaches; Optimism; Regret bounds; Reinforcement learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on
  • Conference_Location
    Allerton, IL
  • Print_ISBN
    978-1-4244-8215-3
  • Type

    conf

  • DOI
    10.1109/ALLERTON.2010.5706896
  • Filename
    5706896