Author/Authors :
Ram, Malihe Mashhad University of Medical Sciences , Najafi, Ali Baqiyatallah University of Medical Sciences, , Shakeri, Mohammad Taghi Mashhad University of Medical Sciences
Abstract :
Background & objective: Microarray and next generation sequencing (NGS) data
are the important sources to find helpful molecular patterns. Also, the great number
of gene expression data increases the challenge of how to identify the biomarkers
associated with cancer. The random forest (RF) is used to effectively analyze the
problems of large-p and small-n. Therefore, RF can be used to select and rank the
genes for the diagnosis and effective treatment of cancer.
Methods: The microarray gene expression data of colon, leukemia, and prostate
cancers were collected from public databases. Primary preprocessing was done on
them using limma package, and then, the RF classification method was implemented
on datasets separately in R software. Finally, the selected genes in each of the cancers
were evaluated and compared with those of previous experimental studies and their
functionalities were assessed in molecular cancer processes.
Result: The RF method extracted very small sets of genes while it retained its
predictive performance. About colon cancer data set DIEXF, GUCA2A, CA7, and
IGHA1 key genes with the accuracy of 87.39 and precision of 85.45 were selected.
The SNCA, USP20, and SNRPA1 genes were selected for prostate cancer with the
accuracy of 73.33 and precision of 66.67. Also, key genes of leukemia data set were
BAG4, ANKHD1-EIF4EBP3, PLXNC1, and PCDH9 genes, and the accuracy and
precision were 100 and 95.24, respectively.
Conclusion: The current study results showed most of the selected genes involved
in the processes and cancerous pathways were previously reported and had an
important role in shifting from normal cell to abnormal.
Keywords :
Microarray , Random Forest , Cancer , Gene Selection , Classification