Title :
Gene selection for high dimensional data using k-means clustering algorithm and statistical approach
Author :
Ahmad, Farzana Kabir ; Yusof, Yuhanis ; Othman, Nor Hayati
Author_Institution :
Comput. Intell. Res. Cluster, Univ. Utara Malaysia, Sintok, Malaysia
Abstract :
Microarray technology can measure thousands of genes which are useful for biologist to study and classify the cancer cells. However, this high dimensional data consists of large number of genes to be examined in regard of small samples size. Thus, selection of relevant genes is a challenging issue in microarray data analysis and has been a central research focus. This study proposed kmeans clustering algorithm to groups the relevant genes. Several statistical techniques such as Fisher criterion, Golub signal-to-noise, Mann Whitney rank and t-test have been used in deciding the clusters are well separated from one and others. Those genes with high discriminative score will later be used to train the k-NN classifier. The experimental results showed that the proposed gene selection methods able to identify differentially expressed genes with 0.86 ROC score.
Keywords :
biology computing; cancer; genetics; pattern classification; pattern clustering; statistical analysis; Fisher criterion; Golub signal-to-noise; Mann Whitney rank; biologist; cancer cells; gene selection methods; high dimensional data; high discriminative score; k-NN classifier; k-means clustering algorithm; microarray data analysis; microarray technology; statistical techniques; t-test; Classification algorithms; Clustering algorithms; Data analysis; Gene expression; Information filtering; Scientific computing; Gene selection; microarray; statistical techniques;
Conference_Titel :
Computational Science and Technology (ICCST), 2014 International Conference on
DOI :
10.1109/ICCST.2014.7045188