DocumentCode
672337
Title
Expert-based reward shaping and exploration scheme for boosting policy learning of dialogue management
Author
Ferreira, Eija ; Lefevre, Francois
Author_Institution
LIA, Univ. of Avignon, Avignon, France
fYear
2013
fDate
8-12 Dec. 2013
Firstpage
108
Lastpage
113
Abstract
This paper investigates the conditions under which expert knowledge can be used to accelerate the policy optimization of a learning agent. Recent works on reinforcement learning for dialogue management allowed to devise sophisticated methods for value estimation in order to deal all together with exploration/exploitation dilemma, sample-efficiency and non-stationary environments. In this paper, a reward shaping method and an exploration scheme, both based on some intuitive hand-coded expert advices, are combined with an efficient temporal difference-based learning procedure. The key objective is to boost the initial training stage, when the system is not sufficiently reliable to interact with real users (e.g. clients). Our claims are illustrated by experiments based on simulation and carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS).
Keywords
expert systems; interactive systems; learning (artificial intelligence); optimisation; HIS; expert knowledge; expert-based reward shaping; exploration scheme; goal-oriented dialogue management; hidden information state; learning agent; policy learning; policy optimization; reinforcement learning; temporal difference-based learning; Context; Convergence; Noise; Space exploration; Training; Uncertainty; Vectors; dialogue management; reinforcement learning; reward shaping; value function approximation;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location
Olomouc
Type
conf
DOI
10.1109/ASRU.2013.6707714
Filename
6707714
Link To Document