• DocumentCode
    1468375
  • Title

    Stable Gene Selection from Microarray Data via Sample Weighting

  • Author

    Lei Yu ; Yue Han ; Berens, M.E.

  • Author_Institution
    Dept. of Comput. Sci., State Univ. of New York, Binghamton, NY, USA
  • Volume
    9
  • Issue
    1
  • fYear
    2012
  • Firstpage
    262
  • Lastpage
    272
  • Abstract
    Feature selection from gene expression microarray data is a widely used technique for selecting candidate genes in various cancer studies. Besides predictive ability of the selected genes, an important aspect in evaluating a selection method is the stability of the selected genes. Experts instinctively have high confidence in the result of a selection method that selects similar sets of genes under some variations to the samples. However, a common problem of existing feature selection methods for gene expression data is that the selected genes by the same method often vary significantly with sample variations. In this work, we propose a general framework of sample weighting to improve the stability of feature selection methods under sample variations. The framework first weights each sample in a given training set according to its influence to the estimation of feature relevance, and then provides the weighted training set to a feature selection method. We also develop an efficient margin-based sample weighting algorithm under this framework. Experiments on a set of microarray data sets show that the proposed algorithm significantly improves the stability of representative feature selection algorithms such as SVM-RFE and ReliefF, without sacrificing their classification performance. Moreover, the proposed algorithm also leads to more stable gene signatures than the state-of-the-art ensemble method, particularly for small signature sizes.
  • Keywords
    arrays; biology computing; feature extraction; genetics; support vector machines; ReliefF algorithm; SVM-RFE algorithm; feature relevance estimation; feature selection method; gene expression microarray data; gene selection; margin-based sample weighting algorithm; weighted training set; Bioinformatics; Cancer; Gene expression; Monte Carlo methods; Stability analysis; Support vector machines; Training; Feature selection; classification; gene expression microarray.; gene selection; stability; Computational Biology; Data Mining; Gene Expression Profiling; Genes; Humans; Models, Genetic; Neoplasms; Oligonucleotide Array Sequence Analysis; Support Vector Machines;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2011.47
  • Filename
    5728792