DocumentCode :
2754579
Title :
Efficient online spherical k-means clustering
Author :
Zhong, Shi
Author_Institution :
Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
Volume :
5
fYear :
2005
fDate :
31 July-4 Aug. 2005
Firstpage :
3180
Abstract :
The spherical k-means algorithm, i.e., the k-means algorithm with cosine similarity, is a popular method for clustering high-dimensional text data. In this algorithm, each document as well as each cluster mean is represented as a high-dimensional unit-length vector. However, it has been mainly used in hatch mode. Thus is, each cluster mean vector is updated, only after all document vectors being assigned, as the (normalized) average of all the document vectors assigned to that cluster. This paper investigates an online version of the spherical k-means algorithm based on the well-known winner-take-all competitive learning. In this online algorithm, each cluster centroid is incrementally updated given a document. We demonstrate that the online spherical k-means algorithm can achieve significantly better clustering results than the batch version, especially when an annealing-type learning rate schedule is used. We also present heuristics to improve the speed, yet almost without loss of clustering quality.
Keywords :
pattern clustering; unsupervised learning; vectors; annealing-type learning rate schedule; cluster mean vector; cosine similarity; high-dimensional text data; k-means algorithm; online spherical k-means clustering; winner-take-all competitive learning; Annealing; Clustering algorithms; Computer science; Data engineering; Data mining; Frequency; Information filtering; Information filters; Information retrieval; Scheduling algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on
Print_ISBN :
0-7803-9048-2
Type :
conf
DOI :
10.1109/IJCNN.2005.1556436
Filename :
1556436
Link To Document :
بازگشت