• DocumentCode
    1654489
  • Title

    Classification and Identification of Differential Gene Expression for Microarray Data: Improvement of the Random Forest Method

  • Author

    Wu, Xiaoyan ; Wu, Zhenyu ; Li, Kang

  • Author_Institution
    Dept. of Biostat., Harbin Med. Univ., Harbin
  • fYear
    2008
  • Firstpage
    763
  • Lastpage
    766
  • Abstract
    Classification and gene selection of microarray data have been important aspects of the investigation of gene expression data in biomedical researches. The analysis of gene expression data presents a new challenge for statistical methods because of its high dimensionality. Random forest has been used to deal with the problem. We present a new classifier named Recursive Random Forest which selects genes automatically and improves the accuracy of classification based on random forest. Three microarray datasets (ALL-AML Leukemia data, Colon Cancer data and Prostate cancer data) were analyzed using Recursive Random Forest. Although the genes selected from the microarray data were only a few, they were effective on cancer prediction and their biological functions have been confirmed. Especially on the ALL-AML Leukemia data, it achieved a perfect accuracy on the test set using only three genes (selected from over 7000). We also research the properties of random forest and recursive random forest on simulated experiments. Recursive random forest provides more useful information than simply using random forest for the further biological experiment, clinical diagnoses and disease therapies because of its function of gene selection, which would probably become an excellent ´tool´ on sample classification and gene selection for microarray data. Source code written in R for Recursive Random Forest is available from http://vxzv.hrbmu.edu.cn/gongwei/biostatistics/.
  • Keywords
    DNA; cancer; genetics; lab-on-a-chip; statistical analysis; ALL-AML Leukemia data; Colon Cancer data; differential gene expression; gene selection; microarray data; recursive random forest; Biological information theory; Biological system modeling; Colon; Data analysis; Diseases; Gene expression; Medical treatment; Prostate cancer; Statistical analysis; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedical Engineering, 2008. ICBBE 2008. The 2nd International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-1747-6
  • Electronic_ISBN
    978-1-4244-1748-3
  • Type

    conf

  • DOI
    10.1109/ICBBE.2008.186
  • Filename
    4535066