DocumentCode
3485276
Title
Randomized maximum entropy language models
Author
Xu, Puyang ; Khudanpur, Sanjeev ; Gunawardana, Asela
Author_Institution
Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
fYear
2011
fDate
11-15 Dec. 2011
Firstpage
226
Lastpage
230
Abstract
We address the memory problem of maximum entropy language models (MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store.
Keywords
entropy; natural language processing; random processes; speech recognition; Bloom filter; MELM implementation; automatic speech recognition; false positive error; feature hashing; feature storage; memory problem; random hash collision; randomized maximum entropy language models; Computational modeling; Dictionaries; Entropy; Memory management; Training; Vectors; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location
Waikoloa, HI
Print_ISBN
978-1-4673-0365-1
Electronic_ISBN
978-1-4673-0366-8
Type
conf
DOI
10.1109/ASRU.2011.6163935
Filename
6163935
Link To Document