Adaptive categorical understanding for spoken dialogue systems

Author

Potamianos, Alexandros ; Narayanan, Shrikanth ; Riccardi, Giuseppe

Author_Institution

Dept. of Electron. & Comput. Eng., Tech. Univ. of Crete, Chania, Greece

Volume

13

Issue

3

fYear

2005

fDate

5/1/2005 12:00:00 AM

Firstpage

321

Lastpage

329

Abstract

In this paper, the speech understanding problem in the context of a spoken dialogue system is formalized in a maximum likelihood framework. Off-line adaptation of stochastic language models that interpolate dialogue state specific and general application-level language models is proposed. Word and dialogue-state n-grams are used for building categorical understanding and dialogue models, respectively. Acoustic confidence scores are incorporated in the understanding formulation. Problems due to data sparseness and out-of-vocabulary words are discussed. The performance of the speech recognition and understanding language models are evaluated with the "Carmen Sandiego" multimodal computer game corpus. Incorporating dialogue models reduces relative understanding error rate by 15%-25%, while acoustic confidence scores achieve a further 10% error reduction for this computer gaming application.

Keywords

computer games; interactive systems; maximum likelihood estimation; natural languages; speech recognition; acoustic confidence score; adaptive categorical understanding; computer gaming application; data sparseness; dialogue state language model; error rate; error reduction; maximum likelihood framework; multimodal computer game corpus; out-of-vocabulary word; speech recognition; speech understanding problem; spoken dialogue system; Acoustic applications; Adaptation model; Computer applications; Computer errors; Error analysis; Maximum likelihood decoding; Natural languages; Routing; Speech recognition; Stochastic processes; Acoustic confidence scores; dialogue modeling; language model adaptation; n-gram models; natural language processing; speech understanding;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/TSA.2005.845836

Filename

1420367