مرکز منطقه ای اطلاع رساني علوم و فناوري - Randomized maximum entropy language models

DocumentCode :

3485276

Title :

Randomized maximum entropy language models

Author :

Xu, Puyang ; Khudanpur, Sanjeev ; Gunawardana, Asela

Author_Institution :

Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA

fYear :

2011

fDate :

11-15 Dec. 2011

Firstpage :

226

Lastpage :

230

Abstract :

We address the memory problem of maximum entropy language models (MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store.

Keywords :

entropy; natural language processing; random processes; speech recognition; Bloom filter; MELM implementation; automatic speech recognition; false positive error; feature hashing; feature storage; memory problem; random hash collision; randomized maximum entropy language models; Computational modeling; Dictionaries; Entropy; Memory management; Training; Vectors; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on

Conference_Location :

Waikoloa, HI

Print_ISBN :

978-1-4673-0365-1

Electronic_ISBN :

978-1-4673-0366-8

Type :

conf

DOI :

10.1109/ASRU.2011.6163935

Filename :

6163935

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3485276