An Initialization Method for Clustering High-Dimensional Data

Author

Chen, Luying ; Chen, Lifei ; Jiang, Qingshan ; Wang, Beizhan ; Shi, Liang

Author_Institution

Software Sch., Xiamen Univ., Xiamen, China

fYear

2009

fDate

25-26 April 2009

Firstpage

444

Lastpage

447

Abstract

In iterative refinement clustering algorithms, such as the various types of K-Means algorithms, the clustering results are very sensitive to the initial cluster centers. Conventional initialization methods tend to loss effectiveness due to the so-called "curse of dimensionality" when clustering high-dimensional data. In this paper, a local density based method is proposed to search for initial cluster centers on high-dimensional data. We define the probability density of a point as the amount of its highly similar neighborhoods with weight coefficient. Points with high density neighborhoods and low similarity are chosen as the initial cluster centers. Experimental results on real world datasets show the effectiveness of the proposed method.

Keywords

data handling; iterative methods; pattern clustering; probability; K-means algorithm; cluster center searching; curse of dimensionality; density neighborhood; high-dimensional data clustering; initialization method; iterative refinement clustering algorithm; local density based method; probability density; weight coefficient; Application software; Clustering algorithms; Computer science; Data mining; Databases; Iterative algorithms; Iterative methods; Loss measurement; Optimization methods; Software algorithms; K-Means type clustering; cluster center initialization; data mining; high-dimensional clustering; neighborhoods based density;

fLanguage

English

Publisher

ieee

Conference_Titel

Database Technology and Applications, 2009 First International Workshop on

Conference_Location

Wuhan, Hubei

Print_ISBN

978-0-7695-3604-0

Type

conf

DOI

10.1109/DBTA.2009.87

Filename

5207723