• DocumentCode
    2465677
  • Title

    Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data

  • Author

    Tan, Feng ; Fu, Xuezheng ; Zhang, Yanqing ; Bourgeois, Anu G.

  • Author_Institution
    Georgia State Univ., Atlanta
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    2529
  • Lastpage
    2534
  • Abstract
    Microarray data usually contains a huge number of genes (features) and a comparatively small number of samples, which make accurate classification or prediction of diseases challenging. Feature selection techniques can help us identify important and irrelevant (unimportant) features by applying certain selection criteria. However, different feature selection algorithms based on various theoretical arguments often produce different results when applied to the same data set. This makes selecting an optimal or near optimal feature subset for a data set difficult. In this paper, we propose using a genetic algorithm to improve feature subset selection by combining valuable outcomes from multiple feature selection methods. The goal of our genetic algorithm is to achieve a balance between the classification accuracy and the size of the feature subsets selected. The advantages of this approach include the ability to accommodate different feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. The experimental results demonstrate that our approach can find subsets of features with higher classification accuracy and/or smaller size compared with each individual feature selection algorithm.
  • Keywords
    diseases; feature extraction; genetic algorithms; genetics; learning by example; medical computing; pattern classification; classification accuracy; disease prediction; feature subset selection; genetic algorithm; inductive learning algorithm; microarray gene expression data; near optimal feature subset; Computer science; Diseases; Diversity reception; Gene expression; Genetic algorithms; Mutual information; Pattern classification; Scalability; Statistical analysis; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation, 2006. CEC 2006. IEEE Congress on
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    0-7803-9487-9
  • Type

    conf

  • DOI
    10.1109/CEC.2006.1688623
  • Filename
    1688623