Title :
Let´s DISCOH: collecting an annotated open corpus with dialogue acts and reward signals for natural language helpdesks
Author :
Andreani, G. ; Di Fabbrizio, G. ; Gilbert, Markus ; Gillick, D. ; Hakkani-Tur, D. ; Lemon, O.
Author_Institution :
Speech Village, Ascoli Piceno
Abstract :
We motivate and explain the DlSCoH project, which uses a publicly deployed spoken dialogue system for conference services to collect a richly annotated corpus of mixed-initiative human- machine spoken dialogues. System users are able to call a phone number and learn about a conference, including paper submission, program, venue, accommodation options and costs, etc. The collected corpus is (1) usable for training, evaluating and comparing statistical models, (2) naturally spoken and task oriented, (3) extendible / generalizable, (4) collected using state-of-the-art research and commercial technology, (5) freely available to researchers. We explain the principles behind the dialogue context representations and reward signals collected by the system, as well as the overall system design, call types, and call flow. We also present results regarding the initial ASR models and spoken language understanding models. We expect the resulting corpora to be used in advanced dialogue research over the coming years.
Keywords :
interactive systems; natural language processing; speech processing; technical support services; DlSCoH project; annotated open corpus; call flow; call types; conference services; mixed-initiative human-machine spoken dialogues; natural language helpdesks; reward signals; spoken dialogue system; Automatic speech recognition; Conferences; Context modeling; Costs; Man machine systems; Management training; Natural languages; Speech processing; Standards development; Stochastic processes;
Conference_Titel :
Spoken Language Technology Workshop, 2006. IEEE
Conference_Location :
Palm Beach
Print_ISBN :
1-4244-0872-5
DOI :
10.1109/SLT.2006.326794