Title :
From minimax value to low-regret algorithms for online Markov decision processes
Author :
Peng Guan ; Raginsky, Maxim ; Willett, Rebecca
Author_Institution :
Dept. of Electr. & Comput. Eng., Duke Univ., Durham, NC, USA
Abstract :
The standard Markov Decision Process (MDP) framework assumes a stationary (or at least predictable) environment. Online learning algorithms can deal with non-stationary or unpredictable environments, but there is no notion of a state that might be changing throughout the learning process as a function of past actions. In recent years, there has been a growing interest in combining the above two frameworks and considering an MDP setting, where the cost function is allowed to change arbitrarily after each time step. However, most of the work in this area has been algorithmic: given a problem, one would design an algorithm from scratch and analyze its performance on a case-by-case basis. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. This paper builds on recent results of Rakhlin et al. to give a general framework for deriving algorithms in an MDP setting with arbitrarily changing costs. This framework leads to a unifying view of existing methods and provides a general procedure for constructing new ones.
Keywords :
Markov processes; computer aided instruction; interactive programming; minimax techniques; MDP; low-regret algorithms; minimax value; nonstationary environments; online Markov decision processes; online learning algorithms; unpredictable environments; Algorithm design and analysis; Cost function; Games; Heuristic algorithms; Kernel; Markov processes; State feedback; Machine learning; Markov processes;
Conference_Titel :
American Control Conference (ACC), 2014
Conference_Location :
Portland, OR
Print_ISBN :
978-1-4799-3272-6
DOI :
10.1109/ACC.2014.6858844