• DocumentCode
    2060494
  • Title

    Using a clustering similarity measure for feature selection in high dimensional data sets

  • Author

    Santos, Jorge M. ; Ramos, Sandra

  • Author_Institution
    Inst. de Eng. Biomedica, Inst. Super. de Eng. do Porto, Porto, Portugal
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 1 2010
  • Firstpage
    900
  • Lastpage
    905
  • Abstract
    Feature selection is a very important preprocessing step in data classification. By applying it we are able to reduce the dimensionality of the problem by removing redundant or irrelevant data. High dimensional data sets are becoming usual nowadays specially in bio-informatics, biology, signal processing or text classification, increasing the need for efficient feature selection methods. In this paper we study the applicability of a clustering validation measure, the Adjusted Rand Index (ARI), for this task comparing it with other methods based on statistical tests and on ROC curve. We have performed some experiments that show the validity of the proposed method.
  • Keywords
    data handling; feature extraction; pattern classification; pattern clustering; statistical analysis; ROC curve; adjusted rand index; clustering similarity measure; data classification; feature selection methods; high dimensional data sets; statistical tests; adjusted rand index; feature selection; high dimensional data sets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-8134-7
  • Type

    conf

  • DOI
    10.1109/ISDA.2010.5687073
  • Filename
    5687073