Title :
Application of the SpecHybrid Algorithm to text document clustering problem
Author :
Uykan, Zekeriya ; Ganiz, Murat C.
Author_Institution :
Electron. & Commun. Eng. Dept, Dogus Univ., Istanbul, Turkey
Abstract :
In this paper, we present a relaxed version of the SpecHybrid Algorithm originally proposed for wireless cellular systems, and apply it to text document clustering problem. We conduct several experiments on two different datasets; a widely used benchmark dataset in English, and a Turkish textual dataset commonly used in text classification. Our results show that the proposed algorithm gives superior performance in text document clustering as compared to the standard k-means algorithm for any number of clusters while giving a comparable or better performance as compared to the standard EM algorithm for relatively large number of clusters depending on the similarity matrices used.
Keywords :
expectation-maximisation algorithm; pattern classification; pattern clustering; text analysis; SpecHybrid algorithm; Turkish textual dataset; similarity matrices; standard EM algorithm; standard k-means algorithm; text classification; text document clustering problem; Classification algorithms; Clustering algorithms; Data mining; Entropy; Euclidean distance; Partitioning algorithms; Turkish document clustering; document clustering; max cut; spectral clustering; textual data mining;
Conference_Titel :
Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on
Conference_Location :
Istanbul
Print_ISBN :
978-1-61284-919-5
DOI :
10.1109/INISTA.2011.5946085