Title :
An improved k-means clustering algorithm based on dissimilarity
Author_Institution :
Dept. of Comput. Sci. & Technol., Langfang Teachers Coll., Langfang, China
Abstract :
K-means clustering algorithm is one of the most widely used clustering algorithms and has been applied in many fields of science and technology. A major problem of the original k-means clustering algorithm is that the cluster results depend on the initial centroids which choose at random. At the same time, the similarity measure on the algorithm based on distance is not suitable for big high- dimensional dataset. They all lead to severe degradation in performance. In this paper, an improved k-means clustering algorithm based on dissimilarity is proposed. It selects the initial centriods using the Huffman tree which uses dissimilarity matrix to construct. Many experiments confirm that the proposed algorithm is an efficient algorithm with better clustering accuracy on the same algorithm time complexity.
Keywords :
computational complexity; matrix algebra; pattern clustering; Huffman tree; algorithm time complexity; big high-dimensional dataset; cluster results; dissimilarity matrix; initial centroids; k-means clustering algorithm; Accuracy; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Data mining; Iris; Machine learning algorithms; Huffman tree; dissimilarity; initial centriods; k-means;
Conference_Titel :
Mechatronic Sciences, Electric Engineering and Computer (MEC), Proceedings 2013 International Conference on
Conference_Location :
Shengyang
Print_ISBN :
978-1-4799-2564-3
DOI :
10.1109/MEC.2013.6885476