DocumentCode
496109
Title
KNN Text Categorization Algorithm Based on Semantic Centre
Author
Zhang Xiao-fei ; Huang He-yan ; Zhang Ke-liang
Author_Institution
Res. Center of C& L Inf. Eng., Chinese Acad. of Sci., Beijing, China
Volume
1
fYear
2009
fDate
25-26 July 2009
Firstpage
249
Lastpage
252
Abstract
As a classical statistical pattern recognition algorithm characterized with high accuracy and stability, KNN has been used widely in text categorization. But since KNNpsilas time complexity is directly proportional to the sample size, its classification speed is very slow. In this paper, we propose a new KNN text categorization algorithm based on semantic centre, which we call SKNN, to speed up the classification. The basic thread is to replace the large number of original sample documents with a small amount of sample semantic centers. Experiments have proved that the SKNNpsilas clarification is over 10 times as fast as that of the traditional KNN and its F1 value is approximately equal to SVM and traditional KNN algorithm.
Keywords
data mining; learning (artificial intelligence); pattern classification; support vector machines; text analysis; KNN text categorization algorithm; SVM; machine learning; pattern classification; semantic centre; statistical pattern recognition algorithm; text mining; time complexity; Computer science; Information technology; Natural languages; Pattern recognition; Stability; Support vector machine classification; Support vector machines; Testing; Text categorization; Text mining; KNN; semantic center; text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology and Computer Science, 2009. ITCS 2009. International Conference on
Conference_Location
Kiev
Print_ISBN
978-0-7695-3688-0
Type
conf
DOI
10.1109/ITCS.2009.57
Filename
5190062
Link To Document