DocumentCode :
1845637
Title :
An Improved Density-Based Method for Reducing Training Data in KNN
Author :
Yongxia Jing ; Heping Gou ; Yaling Zhu
Author_Institution :
Dept. of Inf. Technol., Qiongtai Teachers Coll., Haikou, China
fYear :
2013
fDate :
21-23 June 2013
Firstpage :
972
Lastpage :
975
Abstract :
k-Nearest Neighbor (KNN) algorithm was an efficient text categorization algorithm in recall and accuracy, but the computational overhead of KNN was directly proportional to the sample size, so its classification speed was low in large-scale sample data. Aiming at this problem, the paper presented a density-based method for reducing training data, the method clustered each class of sample data into several clusters and reduced the noise sample data, and then combined some higher similar sample documents in each cluster into one document. Results of the experiment indicated that the method can reduce the computational overhead of KNN text classification, and the performance is approximately equal to those of the traditional KNN.
Keywords :
pattern classification; pattern clustering; text analysis; KNN algorithm; KNN text classification; classification speed; density-based method; documents; k-nearest neighbor algorithm; large-scale sample data; noise sample data reduction; sample data clustering; sample size; text categorization algorithm; training data reduction; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Noise; Support vector machine classification; Text categorization; Training; KNN text classification; samples reducing; similarity; text clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational and Information Sciences (ICCIS), 2013 Fifth International Conference on
Conference_Location :
Shiyang
Type :
conf
DOI :
10.1109/ICCIS.2013.261
Filename :
6643177
Link To Document :
بازگشت