DocumentCode
2651832
Title
RELIEF-C: Efficient Feature Selection for Clustering over Noisy Data
Author
Dash, Manoranjan ; Ong, Yew-Soon
Author_Institution
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear
2011
fDate
7-9 Nov. 2011
Firstpage
869
Lastpage
872
Abstract
RELIEF is a very effective and extremely popular feature selection algorithm developed for the first time in 1992 by Kira and Rendell. Since then it has been modified and expanded in various ways to make it more efficient. But the original RELIEF and all of its expansions are for feature selection over labeled data for classification purposes. To the best of our knowledge, for the first time ever RELIEF is used in this paper as RELIEF-C for unlabeled data to select relevant features for clustering. We modified RELIEF so as to overcome its inherent difficulties in the presence of large number of irrelevant features and/or significant number of noisy tuples. RELIEF-C has several advantages over existing wrapper and filter feature selection methods: (a) it works well in the presence of large amount of noisy tuples, (b) it is robust even when underlying clustering algorithm fails to cluster properly, and (c) it accurately recognizes the relevant features even in the presence of large number of irrelevant features. We compared RELIEF-C with two established feature selection methods for clustering. RELIEF-C outperforms other methods significantly over synthetic, benchmark and real world data sets particularly when data set consists of large amount of noisy tuples and/or irrelevant features.
Keywords
pattern classification; pattern clustering; RELIEF-C; data classification; feature selection; noisy data clustering; Accuracy; Approximation algorithms; Clustering algorithms; Entropy; Machine learning algorithms; Noise measurement; Partitioning algorithms; Feature selection; High-dimensionality; RELIEF; clustering; noise;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
Conference_Location
Boca Raton, FL
ISSN
1082-3409
Print_ISBN
978-1-4577-2068-0
Electronic_ISBN
1082-3409
Type
conf
DOI
10.1109/ICTAI.2011.135
Filename
6103426
Link To Document