• DocumentCode
    2207679
  • Title

    A Variance Reduction Framework for Stable Feature Selection

  • Author

    Han, Yue ; Yu, Lei

  • Author_Institution
    Dept. of Comput. Sci., Binghamton Univ., Binghamton, NY, USA
  • fYear
    2010
  • fDate
    13-17 Dec. 2010
  • Firstpage
    206
  • Lastpage
    215
  • Abstract
    Besides high accuracy, stability of feature selection has recently attracted strong interest in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also suggests a variance reduction approach for improving the stability of feature selection algorithms. Furthermore, we propose an empirical variance reduction framework, margin based instance weighting, which weights training instances according to their influence to the estimation of feature relevance. We also develop an efficient algorithm under this framework. Experiments based on synthetic data and real-world micro array data verify both the theoretical framework and the effectiveness of the proposed algorithm on variance reduction. The proposed algorithm is also shown to be effective at improving subset stability, while maintaining comparable classification accuracy based on selected features.
  • Keywords
    Monte Carlo methods; data mining; feature extraction; lab-on-a-chip; stability; classification accuracy; feature selection error; formal bias variance decomposition; high dimensional data; knowledge discovery; margin based instance weighting; real world microarray data; stable feature selection; subset stability; synthetic data; variance reduction framework; bias-variance decomposition; feature selection; high-dimensional data; stability; variance reduction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2010 IEEE 10th International Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-9131-5
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2010.144
  • Filename
    5693974