Title :
Combining the Missing Link: An Incremental Topic Model of Document Content and Hyperlink
Author :
Ma, Huifang ; Li, Zhixin ; Shi, Zhongzhi
Author_Institution :
Key Lab. of Intell. Inf. Process., Chinese Acad. of Sci., Beijing, China
Abstract :
The content and structure of linked information such as sets of web pages or research paper archives are dynamic and keep on changing. Even though different methods are proposed to exploit both the link structure and the content information, no existing approach can effectively deal with this evolution. We propose a novel joint model, called Link-IPLSI, to combine texts and links in a topic modeling framework incrementally. The model takes advantage of a novel link updating technique that can cope with dynamic changes of online document streams in a faster and scalable way. Furthermore, an adaptive asymmetric learning method is adopted to freely control the assignment of weights to terms and citations. Experimental results on two different sources of online information demonstrate the time saving strength of our method and indicate that our model leads to systematic improvements in the quality of classification and link prediction.
Keywords :
document handling; information analysis; Link-IPLSI; Web pages; document content; hyperlink; incremental topic model; online document streams; online information; topic modeling framework; Adaptive control; Computers; Data mining; Information processing; Intelligent structures; Laboratories; Learning systems; Predictive models; Programmable control; Web pages;
Conference_Titel :
Web Conference (APWEB), 2010 12th International Asia-Pacific
Conference_Location :
Busan
Print_ISBN :
978-1-7695-4012-2
Electronic_ISBN :
978-1-4244-6600-9
DOI :
10.1109/APWeb.2010.27