• DocumentCode
    744109
  • Title

    Comparison of Feature Selection Methods for Cross-Laboratory Microarray Analysis

  • Author

    Hsi-Che Liu ; Pei-Chen Peng ; Tzung-Chien Hsieh ; Ting-Chi Yeh ; Chih-Jen Lin ; Chien-Yu Chen ; Jen-Yin Hou ; Lee-Yung Shih ; Der-Cherng Liang

  • Author_Institution
    Div. of Pediatric Hematology-Oncology, Mackay Med. Coll., Taipei, Taiwan
  • Volume
    10
  • Issue
    3
  • fYear
    2013
  • Firstpage
    593
  • Lastpage
    604
  • Abstract
    The amount of gene expression data of microarray has grown exponentially. To apply them for extensive studies, integrated analysis of cross-laboratory (cross-lab) data becomes a trend, and thus, choosing an appropriate feature selection method is an essential issue. This paper focuses on feature selection for Affymetrix (Affy) microarray studies across different labs. We investigate four feature selection methods: t-test, significance analysis of microarrays (SAM), rank products (RP), and random forest (RF). The four methods are applied to acute lymphoblastic leukemia, acute myeloid leukemia, breast cancer, and lung cancer Affy data which consist of three cross-lab data sets each. We utilize a rank-based normalization method to reduce the bias from cross-lab data sets. Training on one data set or two combined data sets to test the remaining data set(s) are both considered. Balanced accuracy is used for prediction evaluation. This study provides comprehensive comparisons of the four feature selection methods in cross-lab microarray analysis. Results show that SAM has the best classification performance. RF also gets high classification accuracy, but it is not as stable as SAM. The most naive method is t-test, but its performance is the worst among the four methods. In this study, we further discuss the influence from the number of training samples, the number of selected genes, and the issue of unbalanced data sets.
  • Keywords
    bioinformatics; cancer; feature selection; genetics; lab-on-a-chip; learning (artificial intelligence); medical computing; Affymetrix microarray; acute lymphoblastic leukemia; acute myeloid leukemia; breast cancer; cross-laboratory microarray analysis; feature selection methods; gene expression; lung cancer; random forest; rank products; rank-based normalization method; significance analysis; t-test; Microarray data analysis; cancer; cross-laboratory experiment; feature selection;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.70
  • Filename
    6531614