• DocumentCode
    2280542
  • Title

    A language model adaptation using multiple varied corpora

  • Author

    Yamamoto, Hirofumi ; Sagisaka, Yoshinori

  • Author_Institution
    ATR Spoken Language Translation Res. Labs, Kyoto, Japan
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    389
  • Lastpage
    392
  • Abstract
    A new language model adaptation scheme is proposed to cope with multiple varied speech recognition tasks. Both topic difference and sentence style difference resulting from the speaker´s role are reflected in the proposed language model adaptation. An adaptation is carried out using two different language corpora where only the topic or speaker´s style is matched. New word clustering techniques are introduced to extract the topic or style dependency separately. Word neighboring characteristics in the two adaptation source data are regarded as different features in this clustering. All words are classified into commonly used word classes and topic or style dependent classes. Furthermore, target topic and sentence style dependent words and their neighboring characteristics are emphasized according to their frequency in the adaptation target data. In the evaluation experiment, the proposed method shows a 13% lower perplexity and a 9% lower word error rate in continuous speech recognition compared with the conventional adaptation method.
  • Keywords
    error statistics; feature extraction; natural languages; pattern clustering; pattern matching; speech recognition; clustering techniques; commonly used word classes; continuous speech recognition; feature extraction; language model adaptation; multiple varied corpora; perplexity; sentence style difference; speaker role; speaker style matching; topic difference; word error rate; word neighboring characteristics; Adaptation model; Data mining; Error analysis; Frequency; Natural languages; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on
  • Print_ISBN
    0-7803-7343-X
  • Type

    conf

  • DOI
    10.1109/ASRU.2001.1034666
  • Filename
    1034666