DocumentCode :
391043
Title :
Gradient-based policy iteration: an example
Author :
Cao, Xi-Ren ; Fang, Hai-Tao
Author_Institution :
Dept. of Electr. & Electron. Eng., Hong Kong Univ. of Sci. & Technol., China
Volume :
3
fYear :
2002
fDate :
10-13 Dec. 2002
Firstpage :
3367
Abstract :
Research indicates that perturbation analysis (PA), Markov decision processes (MDP), and reinforcement learning (RL) are three closely-related areas in discrete event dynamic system optimization. In particular, it was shown that policy iteration in fact chooses the policy that has the steepest performance gradient (provided by PA) for the next iteration. This sensitivity point of view of MDP leads to some new research topics. We propose to implement policy iteration based on performance gradients. This approach is particularly useful when the actions at different states are correlated and hence the standard policy iteration cannot apply. We illustrate the main ideas with an example of an M/G/1/N queue and identify some further research topics.
Keywords :
Markov processes; decision theory; discrete event systems; gradient methods; iterative methods; learning (artificial intelligence); probability; queueing theory; M/G/1/N queue; Markov decision processes; Q-learning; W-factors; discrete event dynamic system optimization; gradient-based policy iteration; performance gradients; perturbation analysis; reinforcement learning; sensitivity; Control systems; Convergence; Laboratories; Mathematics; Optimization; Performance analysis; Poisson equations; Stochastic processes; System performance; User-generated content;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 2002, Proceedings of the 41st IEEE Conference on
ISSN :
0191-2216
Print_ISBN :
0-7803-7516-5
Type :
conf
DOI :
10.1109/CDC.2002.1184395
Filename :
1184395
Link To Document :
بازگشت