• DocumentCode
    2106497
  • Title

    Feature Selection of Imbalanced Gene Expression Microarray Data

  • Author

    Anaissi, Ali ; Kennedy, Paul J. ; Goyal, Madhu

  • Author_Institution
    Center of Quantum Comput. & Intell. Syst. (QCIS), Univ. of Technol., Broadway, NSW, Australia
  • fYear
    2011
  • fDate
    6-8 July 2011
  • Firstpage
    73
  • Lastpage
    78
  • Abstract
    Gene expression data is a very complex data set characterised by abundant numbers of features but with a low number of observations. However, only a small number of these features are relevant to an outcome of interest. With this kind of data set, feature selection becomes a real prerequisite. This paper proposes a methodology for feature selection for an imbalanced leukaemia gene expression data based on random forest algorithm. It presents the importance of feature selection in terms of reducing the number of features, enhancing the quality of machine learning and providing better understanding for biologists in diagnosis and prediction. Algorithms are presented to show the methodology and strategy for feature selection taking care to avoid over fitting. Moreover, experiments are done using imbalanced Leukaemia gene expression data and special measurement is used to evaluate the quality of feature selection and performance of classification.
  • Keywords
    biology computing; diseases; learning (artificial intelligence); pattern classification; biologists; classification performance; feature selection; imbalanced gene expression microarray data; imbalanced leukaemia gene expression data; machine learning; random forest algorithm; Accuracy; Classification algorithms; Gene expression; Intelligent systems; Prediction algorithms; Training; Vegetation; cost sensitive learning; feature selection; imbalanced data; random forest;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2011 12th ACIS International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-1-4577-0896-1
  • Type

    conf

  • DOI
    10.1109/SNPD.2011.12
  • Filename
    6063547