DocumentCode :
3116952
Title :
Proportionate feature selection - A pre-processing step for clustering
Author :
Sekhon, Jashanjot Singh ; Gopalkrishnan, Vivekanand ; Keong, Ng Wee
Author_Institution :
Nanyang Technol. Univ., Singapore
fYear :
2008
fDate :
12-15 Oct. 2008
Firstpage :
2622
Lastpage :
2627
Abstract :
Accuracy and efficiency of clustering algorithms depend greatly on the input data. Thus, removing unimportant features from the dataset can help us form better clusters in lesser time. These unimportant features may be those that are redundant, or affected by noise, etc. We also need to consider the fact that the features we finally choose, should represent the original dataset in the best possible way. In other words, the underlying structure of the original dataset should be the same as that of the dataset that contains only the selected features. In this paper we propose a technique that selects a subset of features that best represent the entire dataset. This technique is based on two measures - Distance Measure and Similarity Measure. We first group the similar features and then select a proportionate number of features from each group. We perform experiments on a gene expression microarray dataset, and our experimental results show that using our technique as a pre-processing step significantly increases the quality of clusters generated by the underlying K-means algorithm.We also demonstrate that our approach is better than other contemporary pre-processing filters.
Keywords :
data handling; pattern clustering; K-means algorithm; clustering algorithms; distance measure; gene expression microarray dataset; similarity measure; Clustering algorithms; Computational complexity; Data analysis; Data mining; Filters; Gene expression; Humans; Training data; Clustering; Distance Measure; Feature Selection; Microarray Data; Proportionate Number of Genes; Similarity Measure;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on
Conference_Location :
Singapore
ISSN :
1062-922X
Print_ISBN :
978-1-4244-2383-5
Electronic_ISBN :
1062-922X
Type :
conf
DOI :
10.1109/ICSMC.2008.4811691
Filename :
4811691
Link To Document :
بازگشت