DocumentCode :
2270576
Title :
A universal scheme for learning
Author :
Farias, Vivek F. ; Moallemi, Ciamac C. ; Van Roy, Benjamin ; Weissman, Tsachy
Author_Institution :
Dept. of Electr. Eng., Stanford Univ., CA
fYear :
2005
fDate :
4-9 Sept. 2005
Firstpage :
1158
Lastpage :
1162
Abstract :
We consider the problem of optimal control of a Kth order Markov process so as to minimize long-term average cost, a framework with many applications in communications and beyond. Specifically, we wish to do so without knowledge of either the transition kernel or even the order K. We develop and analyze two algorithms, based on the Lempel-Ziv scheme for data compression, that maintain probability estimates along variable length contexts. We establish that eventually, with probability 1, the optimal action is taken at each context. Further, in the case of the second algorithm, we establish almost sure asymptotic optimality
Keywords :
Markov processes; cost optimal control; data compression; probability; stochastic systems; Kth order Markov process; data compression; long-term average cost; optimal control; probability estimates; Algorithm design and analysis; Cost function; Data compression; Decoding; Engineering management; Kernel; Markov processes; Memoryless systems; Optimal control; Stochastic processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Theory, 2005. ISIT 2005. Proceedings. International Symposium on
Conference_Location :
Adelaide, SA
Print_ISBN :
0-7803-9151-9
Type :
conf
DOI :
10.1109/ISIT.2005.1523523
Filename :
1523523
Link To Document :
بازگشت