Title :
Recursive Algorithms for Adaptive Control of Finite Markov Chains
Author :
El-Fattah, Yousri M.
Abstract :
The problem of controlling a finite Markov chain so as to maximize the long-run expected reward per unit time is studied. The chain´s transition probabilities depend upon an unknown parameter taking values in a subset [a,b] of Rn. A control policy is defined as the probability of selecting a control action for each state of the chain. Derived is a Taylor-like expansion formula for the expected reward in terms of policy variations. Based on that result a recursive stochastic gradient algorithm is presented for the adaptation of the control policy at consecutive times. The gradient depends on the estimated transition parameter which is also recursively updated using the gradient of the likelihood function. Convergence with probability 1 is proved for the control and estimation algorithms.
Keywords :
Adaptive control; Automatic control; Convergence; Dynamic programming; Learning automata; Linear programming; Optimal control; Parameter estimation; Recursive estimation; Stochastic processes;
Journal_Title :
Systems, Man and Cybernetics, IEEE Transactions on
DOI :
10.1109/TSMC.1981.4308638