DocumentCode :
2757016
Title :
Topic discovery based on dual EM merging
Author :
Zeng, Jianping ; Duan, Jiangjiao ; Wu, Chengrong
Author_Institution :
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
fYear :
2011
fDate :
10-12 July 2011
Firstpage :
83
Lastpage :
88
Abstract :
Facing the enormous text on the Internet, automatic topic discovery out of large text corpus becomes an important task for advanced intelligence information analysis, such as opinion recognition, Web user interest analysis, etc. Although many topic mining methods have shown great success in dealing with topic-based analysis tasks, it is desired to discover meaningful topic descriptions for informatics analysis. To avoid words with different granularity to explain a topic, a mechanism for separating text corpus into two subsets with equal semantic topics is proposed. EM algorithm is employed to infer topics models for the subsets. Then a merging process is devised to generate topic descriptions based on the output of EM. Experiments on standard AP text corpus shows that the proposed topic discovery method can achieve better perplexity, which means better ability in predicting topics. Furthermore, a test of topics extraction on a collection of news documents about recent Expo 2010 Shanghai China shows that the description key words in topics are more meaningful and reasonable than that of tradition topic mining method.
Keywords :
data mining; expectation-maximisation algorithm; text analysis; EM algorithm; Internet; advanced intelligence information analysis; automatic topic discovery; dual EM merging; informatics analysis; large text corpus; topic mining methods; Computational modeling; Educational institutions; Indexing; Markov processes; Merging; Roads; World Wide Web; EM algorithm; Semantic separation; Topic discovery; Topic quality index;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics (ISI), 2011 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-0082-8
Type :
conf
DOI :
10.1109/ISI.2011.5984055
Filename :
5984055
Link To Document :
بازگشت