Title :
Hybrid PCA-ILGC clustering approach for high dimensional data
Author :
Musdholifah, Aina ; Hashim, Siti Zaiton Mohd ; Ngah, Razali
Author_Institution :
Dept. of Software Eng., Univ. Teknol. Malaysia (UTM), Skudai, Malaysia
Abstract :
The availability of high dimensional dataset that incredible growth, imposes insufficient conventional approaches to extract hidden useful information. As a result, today researchers are challenged to develop new techniques to deal with massive high dimensional data that has not only in term of number of data but also in the number of attributes. In order to improve effectiveness and accuracy of mining task on high dimensional data, an efficient dimensionality reduction method should be executed in data preprocessing stage before clustering technique is applied. Many clustering algorithms has been proposed and used to discover useful information from a dataset. Iterative Local Gaussian Clustering (ILGC) is a simple density based clustering technique that has successfully discovered number of clusters represented in the dataset. In this paper we proposed to use the Principal Component Analysis (PCA) method to preprocess the data prior to ILGC clustering in order to simplify the analysis and visualization of multi dimensional data set. The proposed approach is validated with benchmark classification datasets. In addition, the performance of proposed hybrid PCA-ILGC clustering approach is compared to original ILGC, basic k-means and hybridized k-means. The experimental results indicate that the proposed approach is capable to obtain clusters with higher accuracy, and time taken to process the data was decreased.
Keywords :
Gaussian processes; data analysis; data mining; data visualisation; pattern classification; pattern clustering; principal component analysis; PCA method; basic k-means algorithm; benchmark classification datasets; data preprocessing stage; density based clustering technique; dimensionality reduction method; hidden useful information extraction; high dimensional data mining task; high dimensional dataset; hybrid PCA-ILGC clustering approach; hybridized k-means algorithm; iterative local Gaussian clustering; multidimensional data set analysis; multidimensional data set visualization; principal component analysis method; Accuracy; Algorithm design and analysis; Clustering algorithms; Data mining; Data visualization; Heart; Principal component analysis; Clustering; dimensionality reduction; iterative local Gaussian clustering algorithm; principal component analysis;
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4673-1713-9
Electronic_ISBN :
978-1-4673-1712-2
DOI :
10.1109/ICSMC.2012.6377760