Title :
Annotation-aware web clustering based on topic model and random walks
Author :
Sun, Jiashen ; Wang, Xiaojie ; Yuan, Caixia ; Fang, Guannan
Author_Institution :
Dept. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
Web page clustering based on semantic or topic promises improved search and browsing on the web. Intuitively, tags from social bookmarking websites such as del.icio.us can be used as a complementary source to document thus improving clustering of web pages. In this paper, we present a novel model which employs topic model to associate annotated document with a distribution of topics, and then constructs a graph including tags, document and topics by performing a Random Walks for clustering. We examine the performance of our model on a real-world data set, illustrating that our model provides improved clustering performance than algorithm utilizing page text alone.
Keywords :
Internet; Web sites; document handling; pattern clustering; Web pages; annotation aware Web clustering; document source; random walks; social bookmarking Websites; topic model; Clustering algorithms; Data models; Measurement; Probability; Web pages; Web search; random walks; social tagging; topic model; web clustering;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-61284-203-5
DOI :
10.1109/CCIS.2011.6045023