• DocumentCode
    1454183
  • Title

    Modeling long distance dependence in language: topic mixtures versus dynamic cache models

  • Author

    Iyer, Rukmini M. ; Ostendorf, Mari

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Boston Univ., MA, USA
  • Volume
    7
  • Issue
    1
  • fYear
    1999
  • fDate
    1/1/1999 12:00:00 AM
  • Firstpage
    30
  • Lastpage
    39
  • Abstract
    Standard statistical language models use n-grams to capture local dependencies, or use dynamic modeling techniques to track dependencies within an article. In this paper, we investigate a new statistical language model that captures topic-related dependencies of words within and across sentences. First, we develop a topic-dependent, sentence-level mixture language model which takes advantage of the topic constraints in a sentence or article. Second, we introduce topic-dependent dynamic adaptation techniques in the framework of the mixture model, using n-gram caches and content word unigram caches. Experiments with the static (or unadapted) mixture model on the North American Business (NAB) task show a 21% reduction in perplexity and a 3-4% improvement in recognition accuracy over a general n-gram model, giving a larger gain than that obtained with supervised dynamic cache modeling. Further experiments on the Switchboard corpus also showed a small improvement in performance with the sentence-level mixture model. Cache modeling techniques introduced in the mixture framework contributed a further 14% reduction in perplexity and a small improvement in recognition accuracy on the NAB task for both supervised and unsupervised adaptation
  • Keywords
    natural languages; speech recognition; statistical analysis; North American Business task; Switchboard corpus; content word unigram caches; dynamic cache models; language; long distance dependence; mixture framework; n-gram caches; perplexity; recognition accuracy; sentences; statistical language model; supervised adaptation; topic mixtures; topic-dependent dynamic adaptation; topic-dependent sentence-level mixture language model; topic-related dependencies; unsupervised adaptation; words; Costs; Markov processes; Natural languages; Power engineering and energy; Power engineering computing; Probability; Speech recognition; Standards development; Stochastic processes;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.736328
  • Filename
    736328