Title :
Bootstrapping Language Models for Spoken Dialog Systems From The World Wide Web
Author :
Hakkani-Tür, Dilek ; Rahim, Mazin
Author_Institution :
Int. Comput. Sci. Inst., Berkeley, CA
Abstract :
In this paper, we describe our approach for bootstrapping statistical language models for spoken dialog systems using indomain Web data and utterances collected from previous applications. The approach is based on the idea of stitching conversational templates with the predicate and arguments extracted from the Web pages using semantic role labeling, to generate conversational style utterances. The conversational templates represent the task-independent portions of user utterances and can be built by hand, or learned from utterances collected from other domain applications. Experiments have shown that, stitching with both types of conversational templates have resulted in significantly better ASR word accuracy. Furthermore, the new language model bootstrapping approach can be combined with unsupervised and active learning to improve word accuracy even with very little in-domain transcribed data
Keywords :
computer bootstrapping; natural languages; semantic Web; statistical analysis; World Wide Web; active learning; bootstrapping language models; bootstrapping statistical; semantic role labeling; spoken dialog systems; unsupervised learning; Application software; Automatic speech recognition; Data mining; Information filtering; Information filters; Labeling; Natural languages; Training data; Web pages; Web sites;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
Print_ISBN :
1-4244-0469-X
DOI :
10.1109/ICASSP.2006.1660208