• DocumentCode
    2788221
  • Title

    Language model adaptation using WWW documents obtained by utterance-based queries

  • Author

    Tsiartas, Andreas ; Georgiou, Panayiotis ; Narayanan, Shrikanth

  • Author_Institution
    Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    5406
  • Lastpage
    5409
  • Abstract
    In this paper, we consider the estimation of topic specific Language Models (LM) by exploiting documents from the World Wide Web (WWW). We focus on the quality of the generated queries and propose a novel query generation method. In contrast to the n-gram based queries used in past works, our approach relies on utterances as queries candidates. The proposed approach does not rely on any language specific information other than the initial in-domain training text. We have conducted experiments with Web texts of size 0-150 million words, and we have shown that despite not using any language specific information, the proposed approach results in up to 1.1% absolute Word Error Rate (WER) improvement as compared to keyword-based approaches. The proposed approach reduces the WER by 6.3% absolute in our experiments, compared to an in-domain LM without considering any Web data.
  • Keywords
    Internet; natural language processing; query processing; text analysis; WWW documents; Web data; Web texts; Word Error Rate; World Wide Web; initial in-domain training text; keyword-based approaches; language model adaptation; language specific information; queries; utterance-based queries; Adaptation model; Automatic speech recognition; Internet; Laboratories; Natural languages; Search engines; Speech analysis; Vocabulary; Web sites; World Wide Web; Adapt language models; WWW corpora; in-domain documents; utterance queries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5494928
  • Filename
    5494928