• DocumentCode
    2066069
  • Title

    PLSA Based Topic Mixture Language Modeling Approach

  • Author

    Bai, Shuanhu ; Li, Haizhou

  • Author_Institution
    Inst. for Infocomm Res., Singapore, Singapore
  • fYear
    2008
  • fDate
    16-19 Dec. 2008
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    In this paper, we propose a method to extend the use of latent topics into higher order n-gram models. In training, the parameters of higher order n-gram models are estimated using discounted average counts derived from the application of probabilistic latent semantic analysis(PLSA) models on n-gram counts in training corpus. In decoding, a simple yet efficient topic prediction method is introduced to predict its topic given a new document. The proposed topic mixture language model (TMLM) displays two advantages over previous methods: 1) having the ability of building topic mixture n-gram LM (n>1) and, 2) without requiring a big general baseline LM. The experimental results show that TMLMs, even using smaller number of topics, outperform LMs implemented using both standard n-gram approach and unsupervised adaptation approaches in terms of perplexity reductions.
  • Keywords
    learning (artificial intelligence); natural language processing; higher order n-gram models; probabilistic latent semantic analysis models; topic mixture language modeling; topic prediction method; training corpus; Algorithm design and analysis; Bayesian methods; Clustering algorithms; Decoding; Displays; Error analysis; Prediction methods; Singular value decomposition; Testing; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4244-2942-4
  • Electronic_ISBN
    978-1-4244-2943-1
  • Type

    conf

  • DOI
    10.1109/CHINSL.2008.ECP.58
  • Filename
    4730312