Title :
An Efficient Hybrid Hierarchical Document Clustering Method
Author :
Zhu, Yehang ; Fung, Benjamin C M ; Mu, Dejun ; Li, Yanling
Author_Institution :
Northwestern Polytech. Univ., Xi´´an
Abstract :
Document clustering is a technique for grouping document objects together such that documents within a cluster have high similarity while documents in different clusters have low similarity. Hierarchical document clustering organizes the clusters into a hierarchy such that a parent cluster is a general topic of its child clusters. In this paper, we propose a novel hierarchical document clustering method that is a hybrid version of partitioning and agglomerative clustering approaches. The proposed method inherits the merit of efficiency from the partitioning approach and the hierarchical structure from agglomerative approach. Experiments on real-life datasets suggest that our method is effective and efficient.
Keywords :
document handling; agglomerative clustering; document objects grouping; hybrid hierarchical document clustering; Clustering algorithms; Clustering methods; Fuzzy systems; Information retrieval; Itemsets; Keyword search; Partitioning algorithms; Performance analysis; Scalability; Search engines; clustering algorithm; data mining; document clustering; hybrid method;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
DOI :
10.1109/FSKD.2008.159