Title :
Exploiting semantic associative information in topic modeling
Author :
Wu, Meng-Sung ; Lee, Hung-Shin ; Wang, Hsin-Min
Author_Institution :
Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
Abstract :
Topic modeling has been widely applied in a variety of text modeling tasks as well as in speech recognition systems for effectively capturing the semantic and statistic information in documents or speech utterances. Most topic models rely on the bag-of-words assumption that results in learned latent topics composed of lists of individual words. Unfortunately, these words may convey topical information but lack accurate semantic knowledge of the text. In this paper, we present the semantic associative topic model, where the concept of the semantic association terms is extended to topic modeling, which provides guidance on modeling the semantic associations that occur among single words by expressing a document as an association of multiple words. Further, the pointwise KL-divergence metric is used to measure the significance of the association. We also integrate original PLSA and SATM models, which have mixed feature representations. Experimental results on WSJ and AP datasets show that the proposed approaches achieved higher performance compared to other methods.
Keywords :
natural language processing; speech recognition; statistical analysis; KL-divergence metric; bag-of-words assumption; exploiting semantic associative information; semantic information; semantic knowledge; speech recognition systems; speech utterances; statistic information; text modeling; topic modeling; information retrieval; language model; semantic association; topic model;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2010 IEEE
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-7904-7
Electronic_ISBN :
978-1-4244-7902-3
DOI :
10.1109/SLT.2010.5700883