Title :
Using a clustering similarity measure for feature selection in high dimensional data sets
Author :
Santos, Jorge M. ; Ramos, Sandra
Author_Institution :
Inst. de Eng. Biomedica, Inst. Super. de Eng. do Porto, Porto, Portugal
fDate :
Nov. 29 2010-Dec. 1 2010
Abstract :
Feature selection is a very important preprocessing step in data classification. By applying it we are able to reduce the dimensionality of the problem by removing redundant or irrelevant data. High dimensional data sets are becoming usual nowadays specially in bio-informatics, biology, signal processing or text classification, increasing the need for efficient feature selection methods. In this paper we study the applicability of a clustering validation measure, the Adjusted Rand Index (ARI), for this task comparing it with other methods based on statistical tests and on ROC curve. We have performed some experiments that show the validity of the proposed method.
Keywords :
data handling; feature extraction; pattern classification; pattern clustering; statistical analysis; ROC curve; adjusted rand index; clustering similarity measure; data classification; feature selection methods; high dimensional data sets; statistical tests; adjusted rand index; feature selection; high dimensional data sets;
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-8134-7
DOI :
10.1109/ISDA.2010.5687073