DocumentCode :
391314
Title :
Self learning control of constrained Markov chains - a gradient approach
Author :
Abad, Felisa Vázquez ; Krishnamurthy, Vikram ; Martin, Katerine ; Baltcheva, Eina
Author_Institution :
Dept. d´´Inf. et de Recherche Oper., Montreal Univ., Que., Canada
Volume :
2
fYear :
2002
fDate :
10-13 Dec. 2002
Firstpage :
1940
Abstract :
We present stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by simulation based gradient estimation schemes involving weak derivatives. Similar to neuro-dynamic programming algorithms (e.g. Q-learning or temporal difference methods), the algorithms proposed in the paper are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. However, unlike neuro-dynamic programming methods, the algorithms proposed can handle constraints and time varying parameters. The multiplier based constrained stochastic gradient algorithm proposed is also of independent interest in stochastic approximation.
Keywords :
Markov processes; approximation theory; decision theory; gradient methods; learning systems; self-adjusting systems; constrained Markov chains; constrained average cost finite state Markov decision process; gradient approach; gradient estimation schemes; locally optimal policy; self learning control; stochastic approximation algorithms; time varying parameters; weak derivatives; Approximation algorithms; Australia Council; Computational modeling; Cost function; Kernel; Neurodynamics; Optimal control; State-space methods; Stochastic processes; Telecommunication control;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Decision and Control, 2002, Proceedings of the 41st IEEE Conference on
ISSN :
0191-2216
Print_ISBN :
0-7803-7516-5
Type :
conf
DOI :
10.1109/CDC.2002.1184811
Filename :
1184811
Link To Document :
بازگشت