Title of article :
An improved K-nearest-neighbor algorithm for text categorization
Author/Authors :
Jiang، نويسنده , , Shengyi and Pang، نويسنده , , Guansong and Wu، نويسنده , , Meiling and Kuang، نويسنده , , Limin، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2012
Pages :
7
From page :
1503
To page :
1509
Abstract :
Text categorization is a significant tool to manage and organize the surging text data. Many text categorization algorithms have been explored in previous literatures, such as KNN, Naïve Bayes and Support Vector Machine. KNN text categorization is an effective but less efficient classification method. In this paper, we propose an improved KNN algorithm for text categorization, which builds the classification model by combining constrained one pass clustering algorithm and KNN text categorization. Empirical results on three benchmark corpora show that our algorithm can reduce the text similarity computation substantially and outperform the-state-of-the-art KNN, Naïve Bayes and Support Vector Machine classifiers. In addition, the classification model constructed by the proposed algorithm can be updated incrementally, and it has great scalability in many real-word applications.
Keywords :
KNN text categorization , One-pass clustering , Spam filtering , Text Categorization
Journal title :
Expert Systems with Applications
Serial Year :
2012
Journal title :
Expert Systems with Applications
Record number :
2351012
Link To Document :
بازگشت