Title :
Research on Mining Common Concern via Infinite Topic Modelling
Author :
Yishu Miao ; Chunping Li ; Qiang Ding ; Li Li
Author_Institution :
Sch. of Software, Tsinghua Univ., Beijing, China
Abstract :
This paper focuses on mining common concern among different textual data sources and analyzing their own eigen topics via infinite topic modelling. By incorporating non-parametric Bayesian approaches, our work achieves a good performance and better accords with the reality by avoiding restrictive assumptions. We proposed extended processes of Dirichlet process(DP) -- bidirectional stick-breaking process and multi-branches process -- based on strick-breaking construction to model multiple sequences of probability measures in one process rather than simply combine several DPs. On the basis of this new perspective of DP, we discover the common topics and eigen topics via infinite topic modelling in a simple way without setting topic number. The experiments are carried out on three corpora of BBC news, about the UK, the US and China forum respectively. The results present the common concern of these three districts and their eigen interests in other aspects.
Keywords :
Bayes methods; data analysis; data mining; probability; BBC news; China; DP; Dirichlet process; UK; US; bidirectional stick-breaking process; common concern mining; infinite topic modelling; multibranches process; multiple sequences; nonparametric Bayesian approaches; probability measurement; textual data sources; Common Concern; Hierarchical Dirichlet Process; Infinite Topic Modelling; News;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on
Conference_Location :
Macau
Print_ISBN :
978-1-4673-6057-9
DOI :
10.1109/WI-IAT.2012.159