مرکز منطقه ای اطلاع رساني علوم و فناوري - On-line policy optimisation of Bayesian spoken dialogue systems via human interaction

DocumentCode :

1695703

Title :

On-line policy optimisation of Bayesian spoken dialogue systems via human interaction

Author :

Gasic, M. ; Breslin, C. ; Henderson, Mike ; Kim, Dongkyu ; Szummer, M. ; Thomson, B. ; Tsiakoulis, Pirros ; Young, Stephanie

Author_Institution :

Eng. Dept., Cambridge Univ., Cambridge, MA, USA

fYear :

2013

Firstpage :

8367

Lastpage :

8371

Abstract :

A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy.

Keywords :

Gaussian processes; Markov processes; belief networks; decision theory; human computer interaction; interactive systems; learning (artificial intelligence); optimisation; speech recognition; Bayesian spoken dialogue systems; Bayesian update; Gaussian processes; RL algorithms; automatic policy optimisation; convergence problems; dialogue state system; dynamic Bayesian network-based system; human interaction; improved policy model; online policy optimisation; optimisation space; partially observable Markov decision process; reinforcement learning; reward function; speech recognition error robustness; very low dimensional spaces; Abstracts; Optimization; Robustness; Gaussian process; POMDP; dialogue systems;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639297

Filename :

6639297

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1695703