An improved K-Means clustering algorithm: A step forward for removal of dependency on K

Author

Chadha, Anupama ; Kumar, Sudhakar

Author_Institution

Fac. of Comput. Applic., MRIU, Faridabad, India

fYear

2014

fDate

6-8 Feb. 2014

Firstpage

136

Lastpage

140

Abstract

K-Means is one of the most popular partition based clustering technique. K-means has gain popularity because of its simplicity and speed of classifying massive data rapidly and efficiently. However, the output of K-Means algorithm highly depends upon the selection of initial cluster centers because the initial cluster centers are chosen randomly. The other limitation of the algorithm is to input the required number of clusters. This requires some sort of intuitive knowledge about appropriate value of K which is sometimes difficult to predict as it requires domain knowledge. In this paper, we have proposed an algorithm based on the K-Means, but it does not require the number of clusters K as input. The time complexity and quality of the clusters produced by the proposed algorithm is compared with that of original K-Means using two different data sets.

Keywords

computational complexity; data analysis; pattern clustering; cluster quality; data sets; domain knowledge; improved k-means clustering algorithm; initial cluster centers; massive data classification; partition based clustering technique; time complexity; Clustering algorithms; Euclidean distance; Partitioning algorithms; Proteins; K-means; accuracy; clustering; time complexity;

fLanguage

English

Publisher

ieee

Conference_Titel

Optimization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on

Conference_Location

Faridabad

Print_ISBN

978-1-4799-3958-9

Type

conf

DOI

10.1109/ICROIT.2014.6798312

Filename

6798312