• DocumentCode
    542299
  • Title

    Building a topic-dependent maximum entropy model for very large corpora

  • Author

    Wu, Jun ; Khudanpur, Sanjeev

  • Author_Institution
    Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD 21218, USA
  • Volume
    1
  • fYear
    2002
  • fDate
    13-17 May 2002
  • Abstract
    Maximum entropy (ME) techniques have been successfully used to combine different sources of linguistically meaningful constraints in language models. However, most of the current ME models can only be used for small corpora, since the computational load in training ME models for large corpora is unbearable. This problem is especially severe when non-local dependencies are considered. In this paper, we show how to train and use topic-dependent ME models efficiently for a very large corpus, Broadcast News (BN). The training time is greatly reduced by hierarchical training and divide-and-conquer approaches. The computation in using the model is also simplified by pre-normalizing the denominators of the ME model. We report new speech recognition results showing improvement with the topic model relative to the standard N-gram model for the Broadcast News task.
  • Keywords
    Computational modeling; Entropy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
  • Conference_Location
    Orlando, FL, USA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7402-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2002.5743833
  • Filename
    5743833