DocumentCode :
542299
Title :
Building a topic-dependent maximum entropy model for very large corpora
Author :
Wu, Jun ; Khudanpur, Sanjeev
Author_Institution :
Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD 21218, USA
Volume :
1
fYear :
2002
fDate :
13-17 May 2002
Abstract :
Maximum entropy (ME) techniques have been successfully used to combine different sources of linguistically meaningful constraints in language models. However, most of the current ME models can only be used for small corpora, since the computational load in training ME models for large corpora is unbearable. This problem is especially severe when non-local dependencies are considered. In this paper, we show how to train and use topic-dependent ME models efficiently for a very large corpus, Broadcast News (BN). The training time is greatly reduced by hierarchical training and divide-and-conquer approaches. The computation in using the model is also simplified by pre-normalizing the denominators of the ME model. We report new speech recognition results showing improvement with the topic model relative to the standard N-gram model for the Broadcast News task.
Keywords :
Computational modeling; Entropy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
ISSN :
1520-6149
Print_ISBN :
0-7803-7402-9
Type :
conf
DOI :
10.1109/ICASSP.2002.5743833
Filename :
5743833
Link To Document :
بازگشت