DocumentCode :
2060494
Title :
Using a clustering similarity measure for feature selection in high dimensional data sets
Author :
Santos, Jorge M. ; Ramos, Sandra
Author_Institution :
Inst. de Eng. Biomedica, Inst. Super. de Eng. do Porto, Porto, Portugal
fYear :
2010
fDate :
Nov. 29 2010-Dec. 1 2010
Firstpage :
900
Lastpage :
905
Abstract :
Feature selection is a very important preprocessing step in data classification. By applying it we are able to reduce the dimensionality of the problem by removing redundant or irrelevant data. High dimensional data sets are becoming usual nowadays specially in bio-informatics, biology, signal processing or text classification, increasing the need for efficient feature selection methods. In this paper we study the applicability of a clustering validation measure, the Adjusted Rand Index (ARI), for this task comparing it with other methods based on statistical tests and on ROC curve. We have performed some experiments that show the validity of the proposed method.
Keywords :
data handling; feature extraction; pattern classification; pattern clustering; statistical analysis; ROC curve; adjusted rand index; clustering similarity measure; data classification; feature selection methods; high dimensional data sets; statistical tests; adjusted rand index; feature selection; high dimensional data sets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-8134-7
Type :
conf
DOI :
10.1109/ISDA.2010.5687073
Filename :
5687073
Link To Document :
بازگشت