• DocumentCode
    454715
  • Title

    Bootstrapping Language Models for Spoken Dialog Systems From The World Wide Web

  • Author

    Hakkani-Tür, Dilek ; Rahim, Mazin

  • Author_Institution
    Int. Comput. Sci. Inst., Berkeley, CA
  • Volume
    1
  • fYear
    2006
  • fDate
    14-19 May 2006
  • Abstract
    In this paper, we describe our approach for bootstrapping statistical language models for spoken dialog systems using indomain Web data and utterances collected from previous applications. The approach is based on the idea of stitching conversational templates with the predicate and arguments extracted from the Web pages using semantic role labeling, to generate conversational style utterances. The conversational templates represent the task-independent portions of user utterances and can be built by hand, or learned from utterances collected from other domain applications. Experiments have shown that, stitching with both types of conversational templates have resulted in significantly better ASR word accuracy. Furthermore, the new language model bootstrapping approach can be combined with unsupervised and active learning to improve word accuracy even with very little in-domain transcribed data
  • Keywords
    computer bootstrapping; natural languages; semantic Web; statistical analysis; World Wide Web; active learning; bootstrapping language models; bootstrapping statistical; semantic role labeling; spoken dialog systems; unsupervised learning; Application software; Automatic speech recognition; Data mining; Information filtering; Information filters; Labeling; Natural languages; Training data; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
  • Conference_Location
    Toulouse
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0469-X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2006.1660208
  • Filename
    1660208