• DocumentCode
    3240997
  • Title

    A gene selection approach for classifying diseases based on microarray datasets

  • Author

    Soliman, Taysir Hassan A ; Sewissy, Adel A. ; AbdelLatif, Hisham

  • Author_Institution
    Inf. Syst. Dept., Assiut Univ., Assiut, Egypt
  • fYear
    2010
  • fDate
    2-4 Nov. 2010
  • Firstpage
    626
  • Lastpage
    631
  • Abstract
    Gene Selection is very important problem in the classification of serious diseases in clinical information systems. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analysis. In the current work, a hybrid approach is presented in order to classify diseases, such as colon cancer, leukemia, and liver cancer, based on informative genes. This hybrid approach uses clustering (K-means) with statistical analysis (ANOVA) as a preprocessing step for gene selection and Support Vector Machines (SVM) to classify diseases related to microarray experiments. To compare the performance of the proposed methodology, two kinds of comparisons were achieved: 1) applying statistical analysis combined with clustering algorithm (K-means) as a preprocessing step and 2) comparing different classification algorithms: decision tree (ID3), naïve bayes, adaptive naïve bayes, and support vector machines. In case of combining clustering with statistical analysis, much better classification accuracy is given of 97% rather than without applying clustering in the preprocessing phase. In addition, SVM had proven better accuracy than decision trees, Naïve Bayes, and Adaptive Naïve Bayes classification.
  • Keywords
    Bayes methods; decision trees; diseases; genetics; medical information systems; pattern classification; pattern clustering; statistical analysis; support vector machines; adaptive Naive Bayes method; clinical information system; clustering algorithm; decision tree; disease classification; gene selection approach; microarray dataset; statistical analysis; support vector machine; Analysis of variance; Biological system modeling; Liver; Radiation detectors; ANOVA test; Classification; Clustering; Feature Selection; Gene Selection; Microarray data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Technology and Development (ICCTD), 2010 2nd International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-8844-5
  • Electronic_ISBN
    978-1-4244-8845-2
  • Type

    conf

  • DOI
    10.1109/ICCTD.2010.5645975
  • Filename
    5645975