Title :
Utilizing Different Link Types to Enhance Document Clustering Based on Markov Random Field Model With Relaxation Labeling
Author :
Zhang, Xiaodan ; Xiaohua Hu ; Hu, Xiaohua ; Park, E.K. ; Zhou, Xiaohua
Author_Institution :
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA
Abstract :
With the fast growing number of works utilizing link information in enhancing unsupervised document clustering, it is becoming necessary to make a comparative evaluation of the impacts of different link types on document clustering. Various types of links between text documents, including explicit links such as citation links and hyperlinks, implicit links such as coauthorship and cocitation links, and similarity links such as content similarity links, convey topic similarity or topic transferring patterns, which is very useful for document clustering. In this paper, we adopt a clustering algorithm based on Markov random field and relaxation labeling, which employs both content and linkage information, to evaluate the effectiveness of the aforementioned types of links for document clustering on ten data sets. The experimental results show that linkage information is quite effective in improving content-based document clustering. Furthermore, a series of important findings regarding the impacts of different link types on document clustering is discovered through our experiments.
Keywords :
Markov processes; citation analysis; pattern clustering; random processes; text analysis; Markov random field model; clustering algorithm; coauthorship links; cocitation links; content information; content similarity links; content-based document clustering; document clustering enhancement; hyperlinks; implicit links; link information; link type; linkage information; relaxation labeling; text documents; topic similarity; topic transferring pattern; unsupervised document clustering; Clustering algorithms; Labeling; Markov random fields; Probabilistic logic; Link-based document clustering; Markov random field (MRF); relaxation labeling (RL);
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
DOI :
10.1109/TSMCA.2012.2187183