DocumentCode
1795686
Title
Adaptive demand response: Online learning of restless and controlled bandits
Author
Qingsi Wang ; Mingyan Liu ; Mathieu, Johanna L.
Author_Institution
Univ. of Michigan, Ann Arbor, MI, USA
fYear
2014
fDate
3-6 Nov. 2014
Firstpage
752
Lastpage
757
Abstract
The capabilities of electric loads participating in load curtailment programs are often unknown until the loads have been told to curtail (i.e., deployed) and observed. In programs in which payments are made each time a load is deployed, we aim to pick the “best” loads to deploy in each time step. Our choice is a tradeoff between exploration and exploitation, i.e., curtailing poorly characterized loads in order to better characterize them in the hope of benefiting in the future versus curtailing well-characterized loads so that we benefit now. We formulate this problem as a multi-armed restless bandit problem with controlled bandits. In contrast to past work that has assumed all load parameters are known allowing the use of optimization approaches, we assume the parameters of the controlled system are unknown and develop an online learning approach. Our problem has two features not commonly addressed in the bandit literature: the arms/processes evolve according to different probabilistic laws depending on the control, and the reward/feedback observed by the decision-maker is the total realized curtailment, not the curtailment of each load. We develop an adaptive demand response learning algorithm and an extended version that works with aggregate feedback, both aimed at approximating the Whittle index policy. We show numerically that the regret of our algorithms with respect to the Whittle index policy is of logarithmic order in time, and significantly outperforms standard learning algorithms like UCB1.
Keywords
demand side management; learning (artificial intelligence); optimisation; power engineering computing; probability; Whittle index policy; adaptive demand response learning algorithm; aggregate feedback; controlled bandits; decision-maker; electric loads; load curtailment programs; load parameters; multiarmed restless bandit problem; online learning approach; optimization approach; probabilistic laws; standard UCB1 learning algorithms; Aggregates; Heuristic algorithms; Indexes; Load management; Load modeling; Markov processes; Process control;
fLanguage
English
Publisher
ieee
Conference_Titel
Smart Grid Communications (SmartGridComm), 2014 IEEE International Conference on
Conference_Location
Venice
Type
conf
DOI
10.1109/SmartGridComm.2014.7007738
Filename
7007738
Link To Document