Title :
Sample path sharing in simulation-based policy improvement
Author :
Di Wu ; Qing-Shan Jia ; Chun-Hung Chen
Author_Institution :
Dept. of Autom., Tsinghua Univ., Beijing, China
fDate :
May 31 2014-June 7 2014
Abstract :
Simulation-based policy improvement (SBPI) has been widely used to improve given base policies through simulation. The basic idea of SBPI is to estimate all the Q-factors for a given state using simulation, and then select the action that achieves the minimal cost. It is therefore of great importance to efficiently use the given budget in order to select the best action with high probability. Different from existing budget allocation algorithms that estimate Q-factors by independent simulation, we share the sample paths to improve the probability of correctly selecting the best action. Our method can be combined with equal allocation, Successive Rejects, and optimal computing budget allocation to enhance their probabilities of correct selection as well as to achieve better policies in SBPI. Such improvement depends on the overlap in reachable states under different actions. Numerical results show that with such overlap, combining our method with equal allocation, Successive Rejects and optimal computing budget allocation produces higher probability of selection as well as better policies in SBPI.
Keywords :
budgeting; discrete event simulation; Q-factors estimation; SBPI; budget allocation algorithm; discrete event dynamic system; optimal computing budget allocation; sample path sharing; simulation-based policy improvement; Aggregates; Computational modeling; Estimation; Optimization; Q-factor; Resource management; Discrete event dynamic system; optimal computing budget allocation; simulation-based optimization;
Conference_Titel :
Robotics and Automation (ICRA), 2014 IEEE International Conference on
Conference_Location :
Hong Kong
DOI :
10.1109/ICRA.2014.6907332