• DocumentCode
    3572413
  • Title

    An high-efficient online reinforcement learning algorithm for continuous-state systems

  • Author

    Yuanheng Zhu ; Dongbin Zhao ; Haibo He

  • Author_Institution
    State Key Lab. of Manage. & Control for Complex Syst., Inst. of Autom., Beijing, China
  • fYear
    2014
  • Firstpage
    581
  • Lastpage
    586
  • Abstract
    In this paper, we consider continuous-state systems and pursue a near-optimal policy through online learning. A new online reinforcement learning algorithm-MSEC (Multi-Samples in Each Cell) is proposed. The proposed algorithm combines state aggregation technique and efficient exploration principle, making high utilization of samples observed online. More concretely, we apply a grid over the continuous state space and partition it into different cells. Then, a near-upper Q iteration operator is defined to use samples in each cell and produce a near-upper Q function, whose corresponding greedy policy is efficient for exploration. MSEC is a totally model-free algorithm, which means no system dynamics is required during the implementation. It collects the system knowledge during the online learning. Based on PAC (Probability Approximately Correct) principle, MSCE can find a near-optimal policy in finite time bound online. To test the performance, an inverted pendulum is simulated and the results show the new algorithm is qualified for solving online optimal control problems.
  • Keywords
    continuous time systems; control engineering computing; iterative methods; learning (artificial intelligence); nonlinear systems; optimal control; probability; MSEC; continuous-state system; efficient exploration principle; greedy policy; inverted pendulum; multisamples in each cell; near-optimal policy; near-upper Q iteration operator; online reinforcement learning; probability approximately correct principle; state aggregation technique; Algorithm design and analysis; Approximation algorithms; Heuristic algorithms; Learning (artificial intelligence); Partitioning algorithms; Polynomials; Upper bound; probability approximately correct; reinforcement learning; state aggregation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Control and Automation (WCICA), 2014 11th World Congress on
  • Type

    conf

  • DOI
    10.1109/WCICA.2014.7052778
  • Filename
    7052778