• DocumentCode
    2991578
  • Title

    TKNN: An Improved KNN Algorithm Based on Tree Structure

  • Author

    Juan, Li

  • Author_Institution
    Sch. of Distance Educ., Shaanxi Normal Univ., Xi´´an, China
  • fYear
    2011
  • fDate
    3-4 Dec. 2011
  • Firstpage
    1390
  • Lastpage
    1394
  • Abstract
    Text classification is the process of assigning document to a set of previously fixed categories. It is widely used in many applications, such as web page categorization, email spam filtering, and document indexing, etc. Many popular algorithms for text classification have been proposed, such as Naive Bayes, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). However, these classification approaches do not perform well in multi-class text classification because they are well relied on linear classifiers. KNN is a simple and mature algorithm, but it cannot effectively solve the problem of overlapped categories borders, unbalanced class samples, k value determination, and overlarge search space. In this paper, we propose a new TKNN that absorb tree structure and adaptive k value method based on classical KNN algorithm. TKNN can overcome the shortcoming of KNN and improve the performance of multi-class text classification. Then the theoretical analysis and experimental results show TKNN can greatly enhance the classification efficiency than KNN.
  • Keywords
    pattern classification; support vector machines; text analysis; tree data structures; KNN algorithm; TKNN; Web page categorization; document assignment; document indexing; email spam filtering; fixed categories; k-nearest neighbor; linear classifiers; naive Bayes; support vector machine; text classification; tree structure; Accuracy; Algorithm design and analysis; Buildings; Classification algorithms; Complexity theory; Text categorization; Training; KNN; TKNN; penalty parameter; tree structure; unbalanced class samples;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security (CIS), 2011 Seventh International Conference on
  • Conference_Location
    Hainan
  • Print_ISBN
    978-1-4577-2008-6
  • Type

    conf

  • DOI
    10.1109/CIS.2011.310
  • Filename
    6128351