• DocumentCode
    15037
  • Title

    GECC: Gene Expression Based Ensemble Classification of Colon Samples

  • Author

    Rathore, Saima ; Hussain, Mutawarra ; Khan, Ajmal

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Pakistan Inst. of Eng. & Appl. Sci., Islamabad, Pakistan
  • Volume
    11
  • Issue
    6
  • fYear
    2014
  • fDate
    Nov.-Dec. 1 2014
  • Firstpage
    1131
  • Lastpage
    1145
  • Abstract
    Gene expression deviates from its normal composition in case a patient has cancer. This variation can be used as an effective tool to find cancer. In this study, we propose a novel gene expressions based colon classification scheme (GECC) that exploits the variations in gene expressions for classifying colon gene samples into normal and malignant classes. Novelty of GECC is in two complementary ways. First, to cater overwhelmingly larger size of gene based data sets, various feature extraction strategies, like, chi-square, F-Score, principal component analysis (PCA) and minimum redundancy and maximum relevancy (mRMR) have been employed, which select discriminative genes amongst a set of genes. Second, a majority voting based ensemble of support vector machine (SVM) has been proposed to classify the given gene based samples. Previously, individual SVM models have been used for colon classification, however, their performance is limited. In this research study, we propose an SVM-ensemble based new approach for gene based classification of colon, wherein the individual SVM models are constructed through the learning of different SVM kernels, like, linear, polynomial, radial basis function (RBF), and sigmoid. The predicted results of individual models are combined through majority voting. In this way, the combined decision space becomes more discriminative. The proposed technique has been tested on four colon, and several other binary-class gene expression data sets, and improved performance has been achieved compared to previously reported gene based colon cancer detection techniques. The computational time required for the training and testing of 208 × 5,851 data set has been 591.01 and 0.019 s, respectively.
  • Keywords
    bioinformatics; biological organs; cancer; feature extraction; genetics; pattern classification; polynomial approximation; principal component analysis; radial basis function networks; support vector machines; F-Score; GECC; PCA; SVM kernels; binary-class gene expression data sets; cancer; chi-square; colon gene sample classification; combined decision space; discriminative genes; feature extraction strategies; gene based colon cancer detection techniques; gene based data sets; gene expression based ensemble classification; individual SVM models; linear basis function; malignant classes; maximum relevancy; minimum redundancy; polynomial basis function; principal component analysis; radial basis function; support vector machine; Cancer; Cancer detection; Colon; Gene expression; Genomics; Principal component analysis; Sampling methods; Support vector machines; Colon cancer; F-Score; PCA; chi-square; ensemble classification; gene expressions; mRMR;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2344655
  • Filename
    6872581