• DocumentCode
    2704423
  • Title

    On Compressing N-Gram Language Models

  • Author

    Hirsimaki, T.

  • Author_Institution
    Adaptive Inf. Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
  • Volume
    4
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Abstract
    In large-vocabulary speech recognition systems, the major part of memory resources is typically consumed by a large n-gram language model. Representing the language model compactly is important in recognition systems targeted for small devices with limited memory resources. This paper extends the compressed language model structure proposed earlier by Whittaker and Raj. By separating n-grams that are prefixes to longer n-grams, redundant information can be omitted. Experiments on English 4-gram models and Finnish 6-gram models show that extended structure can achieve up to 30% lossless memory reductions when compared to baseline structure of Whittaker and Raj.
  • Keywords
    data compression; natural language processing; speech coding; speech recognition; English 4-gram models; Finnish 6-gram models; compressing n-gram language models; large-vocabulary speech recognition systems; Data compression; Data structures; Entropy; Informatics; Natural languages; Speech recognition; Target recognition; Text recognition; Tree data structures; Vocabulary; Data compression; Data structures; Modeling; Natural languages; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0727-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2007.367228
  • Filename
    4218259