DocumentCode :
2954532
Title :
Dataset complexity can help to generate accurate ensembles of k-nearest neighbors
Author :
Okun, Oleg ; Valentini, Giorgio
Author_Institution :
Dept. of Electr. & Inf. Eng., Univ. of Oulu, Oulu
fYear :
2008
fDate :
1-8 June 2008
Firstpage :
450
Lastpage :
457
Abstract :
Gene expression based cancer classification using classifier ensembles is the main focus of this work. A new ensemble method is proposed that combines predictions of a small number of k-nearest neighbor (k-NN) classifiers with majority vote. Diversity of predictions is guaranteed by assigning a separate feature subset, randomly sampled from the original set of features, to each classifier. Accuracy of k-NNs is ensured by the statistically confirmed dependence between dataset complexity, determining how difficult is a dataset for classification, and classification error. Experiments carried out on three gene expression datasets containing different types of cancer show that our ensemble method is superior to 1) a single best classifier in the ensemble, 2) the nearest shrunken centroids method originally proposed for gene expression data, and 3) the traditional ensemble construction scheme that does not take into account dataset complexity.
Keywords :
cancer; genetics; learning (artificial intelligence); medical computing; pattern classification; random processes; sampling methods; tumours; cancer classification; classification training point; classifier ensemble generation; ensemble construction scheme; feature subset; gene expression dataset complexity; k-nearest neighbor; nearest shrunken centroid method; random sampling; Cancer; Colon; DNA; Diversity reception; Error analysis; Filters; Gene expression; Predictive models; Statistical analysis; Voting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
Conference_Location :
Hong Kong
ISSN :
1098-7576
Print_ISBN :
978-1-4244-1820-6
Electronic_ISBN :
1098-7576
Type :
conf
DOI :
10.1109/IJCNN.2008.4633831
Filename :
4633831
Link To Document :
بازگشت