DocumentCode
2704423
Title
On Compressing N-Gram Language Models
Author
Hirsimaki, T.
Author_Institution
Adaptive Inf. Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
Volume
4
fYear
2007
fDate
15-20 April 2007
Abstract
In large-vocabulary speech recognition systems, the major part of memory resources is typically consumed by a large n-gram language model. Representing the language model compactly is important in recognition systems targeted for small devices with limited memory resources. This paper extends the compressed language model structure proposed earlier by Whittaker and Raj. By separating n-grams that are prefixes to longer n-grams, redundant information can be omitted. Experiments on English 4-gram models and Finnish 6-gram models show that extended structure can achieve up to 30% lossless memory reductions when compared to baseline structure of Whittaker and Raj.
Keywords
data compression; natural language processing; speech coding; speech recognition; English 4-gram models; Finnish 6-gram models; compressing n-gram language models; large-vocabulary speech recognition systems; Data compression; Data structures; Entropy; Informatics; Natural languages; Speech recognition; Target recognition; Text recognition; Tree data structures; Vocabulary; Data compression; Data structures; Modeling; Natural languages; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location
Honolulu, HI
ISSN
1520-6149
Print_ISBN
1-4244-0727-3
Type
conf
DOI
10.1109/ICASSP.2007.367228
Filename
4218259
Link To Document