مرکز منطقه ای اطلاع رساني علوم و فناوري - Building a topic-dependent maximum entropy model for very large corpora

DocumentCode :

542299

Title :

Building a topic-dependent maximum entropy model for very large corpora

Author :

Wu, Jun ; Khudanpur, Sanjeev

Author_Institution :

Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD 21218, USA

Volume :

fYear :

2002

fDate :

13-17 May 2002

Abstract :

Maximum entropy (ME) techniques have been successfully used to combine different sources of linguistically meaningful constraints in language models. However, most of the current ME models can only be used for small corpora, since the computational load in training ME models for large corpora is unbearable. This problem is especially severe when non-local dependencies are considered. In this paper, we show how to train and use topic-dependent ME models efficiently for a very large corpus, Broadcast News (BN). The training time is greatly reduced by hierarchical training and divide-and-conquer approaches. The computation in using the model is also simplified by pre-normalizing the denominators of the ME model. We report new speech recognition results showing improvement with the topic model relative to the standard N-gram model for the Broadcast News task.

Keywords :

Computational modeling; Entropy;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on

Conference_Location :

Orlando, FL, USA

ISSN :

1520-6149

Print_ISBN :

0-7803-7402-9

Type :

conf

DOI :

10.1109/ICASSP.2002.5743833

Filename :

5743833

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=542299