MSGKA: an efficient clustering algorithm for large databases

Author

Tsai, Cheng-Fa ; Chen, Zhi-Cheng ; Tsai, Chun-Wei

Author_Institution

Dept. of Manage. Inf. Syst., Nat. Pingtung Univ. of Sci. & Technol., Taiwan

Volume

5

fYear

2002

fDate

6-9 Oct. 2002

Abstract

This investigation presents an efficient clustering algorithm for large databases. We present a novel multiple-searching genetic algorithm (MSGA) that finds a globally optimal partition of a given data into a specified number of clusters. We hybridize MSGA with a multiple-searching approach utilized in clustering namely, K-means algorithm. Hence, the name multiple-searching genetic K-means algorithm (MSGKA). Our simulation results reveal that the proposed novel clustering approach performs better than the Fast SOM combines K-means approach (FSOM+K-means) and Genetic K-Means Algorithm (GKA). Moreover, in all the cases we studied, our approach produces much smaller errors than both the FSOM+K-means and GKA.

Keywords

data mining; database theory; genetic algorithms; pattern clustering; search problems; very large databases; Fast SOM combined K-means approach; Genetic K-Means Algorithm; K-means algorithm; MSGKA; clustering algorithm; data mining; errors; large databases; multiple-searching genetic K-means algorithm; multiple-searching genetic algorithm; simulation; Biological cells; Clustering algorithms; Costs; Data mining; Databases; Genetic algorithms; Machine learning; Partitioning algorithms; Pattern recognition; Statistics;

fLanguage

English

Publisher

ieee

Conference_Titel

Systems, Man and Cybernetics, 2002 IEEE International Conference on

ISSN

1062-922X

Print_ISBN

0-7803-7437-1

Type

conf

DOI

10.1109/ICSMC.2002.1176400

Filename

1176400