• DocumentCode
    3461361
  • Title

    Discovery of Biomarker Genes from Earthworm Microarray Data by Discriminant Analysis and Clustering

  • Author

    Ying Li ; Nan Wang ; Perkins, Edward J. ; Ping Gong

  • Author_Institution
    Univ. of Southern Mississippi, Hattiesburg, MS, USA
  • fYear
    2009
  • fDate
    3-5 Aug. 2009
  • Firstpage
    23
  • Lastpage
    29
  • Abstract
    Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. A variety of toxicological effects have been associated with explosive compounds 2,4,6-trinitrotoluene (TNT) and 1,3,5-trinitro-1,3,5-triazacyclohexane (RDX). Here we developed a discriminant analysis and cluster (DAC) pipeline to analyze a 248-array dataset with 15,208 non-redundant earthworm (Eisenia fetida) gene probes on each array. Our objective was to identify biomarker genes that can separate earthworm samples into three groups: control (untreated), TNT-treated, and RDX-treated. First, the class comparison statistical algorithm implemented in BRB-ArrayTools was used to infer a total of 869 genes that significantly changed relative to controls as a result of exposure to TNT or RDX at various concentrations for 4 or 14 days. Then, nine tree-based supervised machine learning algorithms were applied to generate classification rules and a set of 286 classifier genes. These classifier genes were ranked by their overall weight of significance in the nine classification methods, and were used to build support vector machines (SVM). A SVM containing all 286 classifier genes had the highest classification accuracy (91.5%). Results of unsupervised clustering show that the use of the top 100 classifier genes can assign the largest number of the 248 worm samples into the three reference clusters obtained by using all the 14,188 filtered genes, suggesting that these top-ranked genes may be potential candidates for biomarkers. This study demonstrates that the DAC pipeline can be used to identify a small set of biomarker genes from high dimensional datasets and generate a reliable SVM classification model for multiple classes.
  • Keywords
    biology computing; learning (artificial intelligence); pattern classification; pattern clustering; Eisenia fetida; SVM classification model; biomarker genes; classification rules; classifier genes; discriminant analysis; earthworm microarray data; high dimensional dataset; support vector machines; tree-based supervised machine learning algorithm; unsupervised clustering; Biomarkers; Chemicals; Classification tree analysis; Data analysis; Machine learning algorithms; Monitoring; Pipelines; Support vector machine classification; Support vector machines; Toxicology; Biomarker; Classification; Clustering; Decision tree; Earthworm Microarray; Support vector machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS '09. International Joint Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3739-9
  • Type

    conf

  • DOI
    10.1109/IJCBS.2009.134
  • Filename
    5260755