Title :
A universal scheme for learning
Author :
Farias, Vivek F. ; Moallemi, Ciamac C. ; Van Roy, Benjamin ; Weissman, Tsachy
Author_Institution :
Dept. of Electr. Eng., Stanford Univ., CA
Abstract :
We consider the problem of optimal control of a Kth order Markov process so as to minimize long-term average cost, a framework with many applications in communications and beyond. Specifically, we wish to do so without knowledge of either the transition kernel or even the order K. We develop and analyze two algorithms, based on the Lempel-Ziv scheme for data compression, that maintain probability estimates along variable length contexts. We establish that eventually, with probability 1, the optimal action is taken at each context. Further, in the case of the second algorithm, we establish almost sure asymptotic optimality
Keywords :
Markov processes; cost optimal control; data compression; probability; stochastic systems; Kth order Markov process; data compression; long-term average cost; optimal control; probability estimates; Algorithm design and analysis; Cost function; Data compression; Decoding; Engineering management; Kernel; Markov processes; Memoryless systems; Optimal control; Stochastic processes;
Conference_Titel :
Information Theory, 2005. ISIT 2005. Proceedings. International Symposium on
Conference_Location :
Adelaide, SA
Print_ISBN :
0-7803-9151-9
DOI :
10.1109/ISIT.2005.1523523