Title :
Self learning control of constrained Markov chains - a gradient approach
Author :
Abad, Felisa Vázquez ; Krishnamurthy, Vikram ; Martin, Katerine ; Baltcheva, Eina
Author_Institution :
Dept. d´´Inf. et de Recherche Oper., Montreal Univ., Que., Canada
Abstract :
We present stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by simulation based gradient estimation schemes involving weak derivatives. Similar to neuro-dynamic programming algorithms (e.g. Q-learning or temporal difference methods), the algorithms proposed in the paper are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. However, unlike neuro-dynamic programming methods, the algorithms proposed can handle constraints and time varying parameters. The multiplier based constrained stochastic gradient algorithm proposed is also of independent interest in stochastic approximation.
Keywords :
Markov processes; approximation theory; decision theory; gradient methods; learning systems; self-adjusting systems; constrained Markov chains; constrained average cost finite state Markov decision process; gradient approach; gradient estimation schemes; locally optimal policy; self learning control; stochastic approximation algorithms; time varying parameters; weak derivatives; Approximation algorithms; Australia Council; Computational modeling; Cost function; Kernel; Neurodynamics; Optimal control; State-space methods; Stochastic processes; Telecommunication control;
Conference_Titel :
Decision and Control, 2002, Proceedings of the 41st IEEE Conference on
Print_ISBN :
0-7803-7516-5
DOI :
10.1109/CDC.2002.1184811