DocumentCode
2060494
Title
Using a clustering similarity measure for feature selection in high dimensional data sets
Author
Santos, Jorge M. ; Ramos, Sandra
Author_Institution
Inst. de Eng. Biomedica, Inst. Super. de Eng. do Porto, Porto, Portugal
fYear
2010
fDate
Nov. 29 2010-Dec. 1 2010
Firstpage
900
Lastpage
905
Abstract
Feature selection is a very important preprocessing step in data classification. By applying it we are able to reduce the dimensionality of the problem by removing redundant or irrelevant data. High dimensional data sets are becoming usual nowadays specially in bio-informatics, biology, signal processing or text classification, increasing the need for efficient feature selection methods. In this paper we study the applicability of a clustering validation measure, the Adjusted Rand Index (ARI), for this task comparing it with other methods based on statistical tests and on ROC curve. We have performed some experiments that show the validity of the proposed method.
Keywords
data handling; feature extraction; pattern classification; pattern clustering; statistical analysis; ROC curve; adjusted rand index; clustering similarity measure; data classification; feature selection methods; high dimensional data sets; statistical tests; adjusted rand index; feature selection; high dimensional data sets;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on
Conference_Location
Cairo
Print_ISBN
978-1-4244-8134-7
Type
conf
DOI
10.1109/ISDA.2010.5687073
Filename
5687073
Link To Document