DocumentCode :
1693139
Title :
An analysis of document clustering algorithms
Author :
Amala Bai, V Mary ; Manimegalai, D.
Author_Institution :
Dept. of Inf. Technol., Noorul Islam Univ., Kumaracoil, India
fYear :
2010
Firstpage :
402
Lastpage :
406
Abstract :
Document clustering organizes documents into groups such that each group contains documents with similar content. This paper presents the results of an experimental study of some common document clustering techniques. In particular, comparison of Euclidean K-means (K-Means), Spherical K-means(SK-Means) and unsupervised Principal Direction Divisive Partitioning (PDDP) algorithms is done. A comparative analysis of the algorithms is performed using the evaluation measures, Entropy and F-measure. The experiments were conducted on the standard dataset. Clustering algorithms such as K-means and SK-means are easy to implement but their answers strongly depend on their initialization. PDDP is comparatively difficult to implement since it is a hierarchical algorithm. On the other hand its performance does not depend on initial clusters. The results indicate that for certain initial clusters, the K-means and SK-means performed well than PDDP. When there are equal numbers of documents in all the classes, the clusters produced by the algorithms were very effective to that of when different classes had different number of documents. Also with no stop word removal the quality of PDDP degraded compared to K-Means.
Keywords :
document handling; pattern clustering; singular value decomposition; document clustering algorithm; euclidean K-mean algorithm; singular value decomposition; spherical K-mean algorithm; term document matrix; unsupervised principal direction divisive partitioning; Entropy; Equations; MATLAB; Magnetic resonance imaging; Marine vehicles; Writing; Clustering; hierarchical; partitional; singular value decomposition; term document matrix;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication Control and Computing Technologies (ICCCCT), 2010 IEEE International Conference on
Conference_Location :
Ramanathapuram
Print_ISBN :
978-1-4244-7769-2
Type :
conf
DOI :
10.1109/ICCCCT.2010.5670585
Filename :
5670585
Link To Document :
بازگشت