• DocumentCode
    323516
  • Title

    Comparison of part-of-speech and automatically derived category-based language models for speech recognition

  • Author

    Niesler, T.R. ; Whittaker, E.W.D. ; Woodland, P.C.

  • Author_Institution
    Dept. of Eng., Cambridge Univ., UK
  • Volume
    1
  • fYear
    1998
  • fDate
    12-15 May 1998
  • Firstpage
    177
  • Abstract
    This paper compares various category-based language models when used in conjunction with a word-based trigram by means of linear interpolation. Categories corresponding to parts-of-speech as well as automatically clustered groupings are considered. The category-based model employs variable-length n-grams and permits each word to belong to multiple categories. Relative word error rate reductions of between 2 and 7% over the baseline are achieved in N-best rescoring experiments on the Wall Street Journal corpus. The largest improvement is obtained with a model using automatically determined categories. Perplexities continue to decrease as the number of different categories is increased, but improvements in the word error rate reach an optimum
  • Keywords
    grammars; interpolation; natural languages; pattern recognition; speech processing; speech recognition; N-best rescoring experiments; Wall Street Journal corpus; automatically clustered groupings; automatically determined categories; category-based language models; linear interpolation; part-of-speech; perplexities; speech recognition; variable-length n-grams; word error rate reduction; word-based trigram; Clustering algorithms; Equations; Error analysis; History; Interpolation; Natural languages; Robustness; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
  • Conference_Location
    Seattle, WA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-4428-6
  • Type

    conf

  • DOI
    10.1109/ICASSP.1998.674396
  • Filename
    674396