مرکز منطقه ای اطلاع رساني علوم و فناوري - Q-learning and Pontryagin´s Minimum Principle

DocumentCode :

3297315

Title :

Q-learning and Pontryagin´s Minimum Principle

Author :

Mehta, Prashant ; Meyn, Sean

Author_Institution :

Dept. of Mech. Sci. & Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA

fYear :

2009

fDate :

15-18 Dec. 2009

Firstpage :

3598

Lastpage :

3605

Abstract :

Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based on observations of the system controlled using a non-optimal policy. It has proven to be effective for models with finite state and action space. This paper establishes connections between Q-learning and nonlinear control of continuous-time models with general state space and general action space. The main contributions are summarized as follows. (i) The starting point is the observation that the "Q-function" appearing in Q-learning algorithms is an extension of the Hamiltonian that appears in the minimum principle. Based on this observation we introduce the steepest descent Q-learning algorithm to obtain the optimal approximation of the Hamiltonian within a prescribed function class, (ii) A transformation of the optimality equations is performed based on the adjoint of a resolvent operator. This is used to construct a consistent algorithm based on stochastic approximation that requires only causal filtering of observations, (iii) Several examples are presented to illustrate the application of these techniques, including application to distributed control of multi-agent systems.

Keywords :

Markov processes; continuous time systems; convex programming; learning (artificial intelligence); minimum principle; nonlinear control systems; state-space methods; Pontryagin minimum principle; Q-learning; continuous time models; controlled Markov chain; distributed control; general action space; general state space; multiagent systems; nonlinear control; optimal Hamiltonian approximation; optimal policy; stochastic approximation; Approximation algorithms; Control systems; Convergence; Dynamic programming; Filtering algorithms; Learning; Nonlinear equations; Optimal control; State-space methods; Stochastic processes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Decision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedings of the 48th IEEE Conference on

Conference_Location :

Shanghai

ISSN :

0191-2216

Print_ISBN :

978-1-4244-3871-6

Electronic_ISBN :

0191-2216

Type :

conf

DOI :

10.1109/CDC.2009.5399753

Filename :

5399753

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3297315