• DocumentCode
    2330663
  • Title

    Exploiting semantic associative information in topic modeling

  • Author

    Wu, Meng-Sung ; Lee, Hung-Shin ; Wang, Hsin-Min

  • Author_Institution
    Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
  • fYear
    2010
  • fDate
    12-15 Dec. 2010
  • Firstpage
    384
  • Lastpage
    388
  • Abstract
    Topic modeling has been widely applied in a variety of text modeling tasks as well as in speech recognition systems for effectively capturing the semantic and statistic information in documents or speech utterances. Most topic models rely on the bag-of-words assumption that results in learned latent topics composed of lists of individual words. Unfortunately, these words may convey topical information but lack accurate semantic knowledge of the text. In this paper, we present the semantic associative topic model, where the concept of the semantic association terms is extended to topic modeling, which provides guidance on modeling the semantic associations that occur among single words by expressing a document as an association of multiple words. Further, the pointwise KL-divergence metric is used to measure the significance of the association. We also integrate original PLSA and SATM models, which have mixed feature representations. Experimental results on WSJ and AP datasets show that the proposed approaches achieved higher performance compared to other methods.
  • Keywords
    natural language processing; speech recognition; statistical analysis; KL-divergence metric; bag-of-words assumption; exploiting semantic associative information; semantic information; semantic knowledge; speech recognition systems; speech utterances; statistic information; text modeling; topic modeling; information retrieval; language model; semantic association; topic model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2010 IEEE
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-1-4244-7904-7
  • Electronic_ISBN
    978-1-4244-7902-3
  • Type

    conf

  • DOI
    10.1109/SLT.2010.5700883
  • Filename
    5700883