Title :
KNN Text Categorization Algorithm Based on Semantic Centre
Author :
Zhang Xiao-fei ; Huang He-yan ; Zhang Ke-liang
Author_Institution :
Res. Center of C& L Inf. Eng., Chinese Acad. of Sci., Beijing, China
Abstract :
As a classical statistical pattern recognition algorithm characterized with high accuracy and stability, KNN has been used widely in text categorization. But since KNNpsilas time complexity is directly proportional to the sample size, its classification speed is very slow. In this paper, we propose a new KNN text categorization algorithm based on semantic centre, which we call SKNN, to speed up the classification. The basic thread is to replace the large number of original sample documents with a small amount of sample semantic centers. Experiments have proved that the SKNNpsilas clarification is over 10 times as fast as that of the traditional KNN and its F1 value is approximately equal to SVM and traditional KNN algorithm.
Keywords :
data mining; learning (artificial intelligence); pattern classification; support vector machines; text analysis; KNN text categorization algorithm; SVM; machine learning; pattern classification; semantic centre; statistical pattern recognition algorithm; text mining; time complexity; Computer science; Information technology; Natural languages; Pattern recognition; Stability; Support vector machine classification; Support vector machines; Testing; Text categorization; Text mining; KNN; semantic center; text categorization;
Conference_Titel :
Information Technology and Computer Science, 2009. ITCS 2009. International Conference on
Conference_Location :
Kiev
Print_ISBN :
978-0-7695-3688-0
DOI :
10.1109/ITCS.2009.57