• DocumentCode
    496109
  • Title

    KNN Text Categorization Algorithm Based on Semantic Centre

  • Author

    Zhang Xiao-fei ; Huang He-yan ; Zhang Ke-liang

  • Author_Institution
    Res. Center of C& L Inf. Eng., Chinese Acad. of Sci., Beijing, China
  • Volume
    1
  • fYear
    2009
  • fDate
    25-26 July 2009
  • Firstpage
    249
  • Lastpage
    252
  • Abstract
    As a classical statistical pattern recognition algorithm characterized with high accuracy and stability, KNN has been used widely in text categorization. But since KNNpsilas time complexity is directly proportional to the sample size, its classification speed is very slow. In this paper, we propose a new KNN text categorization algorithm based on semantic centre, which we call SKNN, to speed up the classification. The basic thread is to replace the large number of original sample documents with a small amount of sample semantic centers. Experiments have proved that the SKNNpsilas clarification is over 10 times as fast as that of the traditional KNN and its F1 value is approximately equal to SVM and traditional KNN algorithm.
  • Keywords
    data mining; learning (artificial intelligence); pattern classification; support vector machines; text analysis; KNN text categorization algorithm; SVM; machine learning; pattern classification; semantic centre; statistical pattern recognition algorithm; text mining; time complexity; Computer science; Information technology; Natural languages; Pattern recognition; Stability; Support vector machine classification; Support vector machines; Testing; Text categorization; Text mining; KNN; semantic center; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology and Computer Science, 2009. ITCS 2009. International Conference on
  • Conference_Location
    Kiev
  • Print_ISBN
    978-0-7695-3688-0
  • Type

    conf

  • DOI
    10.1109/ITCS.2009.57
  • Filename
    5190062