DocumentCode :
3268656
Title :
Large-scale tabular-form hardware architecture for Q-Learning with delays
Author :
Liu, Zhenzehn ; Elhanany, Itamar
Author_Institution :
Univ. of Tennessee, Knoxville
fYear :
2007
fDate :
5-8 Aug. 2007
Firstpage :
827
Lastpage :
830
Abstract :
Q-Learning is a popular reinforcement learning algorithm which has been widely used in stochastic control applications. The bottleneck of applying tabular form Q learning in reinforcement learning problems with large scale or high dimensional action sets is the considerable delays caused by action selection and value function updates. In this paper, we present a novel hardware architecture that significantly reduces the delays. Moreover, we formulate the Q learning algorithm in cases of observation and action delays and provide a set of proofs confirming that Q-Learning with such delays converges to the optimal policy.
Keywords :
delays; learning (artificial intelligence); Q-learning; delays; large-scale tabular-form hardware; optimal policy; reinforcement learning algorithm; stochastic control; Delay; Hardware; Iron; Large-scale systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on
Conference_Location :
Montreal, Que.
ISSN :
1548-3746
Print_ISBN :
978-1-4244-1175-7
Electronic_ISBN :
1548-3746
Type :
conf
DOI :
10.1109/MWSCAS.2007.4488701
Filename :
4488701
Link To Document :
بازگشت