Title :
A hybrid algorithm of minimum spanning tree and nearest neighbor for classifying human cancers
Author :
Zhou, Chunbao ; Wan, Liming ; Liang, Yanchun
Author_Institution :
Key Lab. of Symbol Comput. & Knowledge Eng. of the Minist. of Educ., Jilin Univ., Changchun, China
Abstract :
Classification and prediction of different cancers based on gene expression profiles are important for cancer diagnosis, cancer treatment and medication discovery. The k nearest neighbor algorithm (k-NN) is one easy and efficient machine learning method for cancer classification and the parameter k is crucial. In this paper, we integrate minimum spanning tree (MST) and k nearest neighbor algorithm (k-NN) for cancer classification. The MST is designed for the selection of parameter k and the nearest neighbors for k-NN. Firstly we build a minimum spanning tree (MST) based on Euclidean distance between each two samples for gene expression data only including one unknown class sample. Secondly for unknown class sample in the gene expression data, we find the connected samples and then apply majority vote principle. Thirdly if there are tied votes then we expend the connected samples with the nearest neighbors for unknown class sample until all the samples are expended or the class for unknown sample is obtained. This hybrid algorithm is referred to as MSTNN. The hybrid algorithm MSTNN is compared with k-NN and other 3 existing classification algorithms on CNS dataset, Colon dataset and Lung dataset, 3 binary class gene expression datasets and 3 multi-class gene expression datasets: Leukemia1, Leukemia2, and Leukemia3 involving human cancers. The MSTNN algorithm improves 5.65% better than k-NN on average LOOCV accuracy and 13.80% better than k-NN on testing datasets classification average accuracy, and achieves the best performance in all the 5 algorithms. The results demonstrate that the proposed MSTNN algorithm is feasible to classify human cancers.
Keywords :
cancer; genetics; medical diagnostic computing; patient diagnosis; patient treatment; pattern classification; trees (mathematics); Euclidean distance; cancer diagnosis; cancer treatment; gene expression dataset; human cancer classification; hybrid algorithm; medication discovery; minimum spanning tree; nearest neighbor algorithm; Artificial neural networks; Cancer; Classification algorithms; Colon; Lungs; Niobium; Support vector machines; cancer classification; gene expression profile; k-nearest-neighbor; minimum spanning tree;
Conference_Titel :
Advanced Computer Theory and Engineering (ICACTE), 2010 3rd International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-6539-2
DOI :
10.1109/ICACTE.2010.5579426