DocumentCode :
496109
Title :
KNN Text Categorization Algorithm Based on Semantic Centre
Author :
Zhang Xiao-fei ; Huang He-yan ; Zhang Ke-liang
Author_Institution :
Res. Center of C& L Inf. Eng., Chinese Acad. of Sci., Beijing, China
Volume :
1
fYear :
2009
fDate :
25-26 July 2009
Firstpage :
249
Lastpage :
252
Abstract :
As a classical statistical pattern recognition algorithm characterized with high accuracy and stability, KNN has been used widely in text categorization. But since KNNpsilas time complexity is directly proportional to the sample size, its classification speed is very slow. In this paper, we propose a new KNN text categorization algorithm based on semantic centre, which we call SKNN, to speed up the classification. The basic thread is to replace the large number of original sample documents with a small amount of sample semantic centers. Experiments have proved that the SKNNpsilas clarification is over 10 times as fast as that of the traditional KNN and its F1 value is approximately equal to SVM and traditional KNN algorithm.
Keywords :
data mining; learning (artificial intelligence); pattern classification; support vector machines; text analysis; KNN text categorization algorithm; SVM; machine learning; pattern classification; semantic centre; statistical pattern recognition algorithm; text mining; time complexity; Computer science; Information technology; Natural languages; Pattern recognition; Stability; Support vector machine classification; Support vector machines; Testing; Text categorization; Text mining; KNN; semantic center; text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology and Computer Science, 2009. ITCS 2009. International Conference on
Conference_Location :
Kiev
Print_ISBN :
978-0-7695-3688-0
Type :
conf
DOI :
10.1109/ITCS.2009.57
Filename :
5190062
Link To Document :
بازگشت