• DocumentCode
    3564125
  • Title

    Gene selection for high dimensional data using k-means clustering algorithm and statistical approach

  • Author

    Ahmad, Farzana Kabir ; Yusof, Yuhanis ; Othman, Nor Hayati

  • Author_Institution
    Comput. Intell. Res. Cluster, Univ. Utara Malaysia, Sintok, Malaysia
  • fYear
    2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Microarray technology can measure thousands of genes which are useful for biologist to study and classify the cancer cells. However, this high dimensional data consists of large number of genes to be examined in regard of small samples size. Thus, selection of relevant genes is a challenging issue in microarray data analysis and has been a central research focus. This study proposed kmeans clustering algorithm to groups the relevant genes. Several statistical techniques such as Fisher criterion, Golub signal-to-noise, Mann Whitney rank and t-test have been used in deciding the clusters are well separated from one and others. Those genes with high discriminative score will later be used to train the k-NN classifier. The experimental results showed that the proposed gene selection methods able to identify differentially expressed genes with 0.86 ROC score.
  • Keywords
    biology computing; cancer; genetics; pattern classification; pattern clustering; statistical analysis; Fisher criterion; Golub signal-to-noise; Mann Whitney rank; biologist; cancer cells; gene selection methods; high dimensional data; high discriminative score; k-NN classifier; k-means clustering algorithm; microarray data analysis; microarray technology; statistical techniques; t-test; Classification algorithms; Clustering algorithms; Data analysis; Gene expression; Information filtering; Scientific computing; Gene selection; microarray; statistical techniques;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Technology (ICCST), 2014 International Conference on
  • Type

    conf

  • DOI
    10.1109/ICCST.2014.7045188
  • Filename
    7045188