• DocumentCode
    3440002
  • Title

    An improved kNN learning based korean text classifier with heuristic information

  • Author

    Lim, Heui-Seok

  • Author_Institution
    Dept. of Inf. & Commun., Cheonan Univ., South Korea
  • Volume
    2
  • fYear
    2002
  • fDate
    18-22 Nov. 2002
  • Firstpage
    731
  • Abstract
    Automatic text categorization is a problem of assigning predefined categories to free text documents based on the likelihood suggested by a training set of labelled texts. kNN learning based text classifier is a well known statistical approach and its algorithm is quite simple. While the method has been applied to many systems and shown relatively good performance, a through evaluation of the method has rarely been done. There are some parameters which play important roles in the performance of the method: decision function, k value of kNN, and size of feature set. This paper focuses on an improving method for a kNN learning based Korean text classifier by using heuristic information found experimentally. Our results show that kNN method with carefully chosen parameters is very significant in improving the performance and decreasing the size of feature set.
  • Keywords
    indexing; learning (artificial intelligence); statistical analysis; text analysis; Korean text classifier; heuristic information; indexing; k-nearest neighbor method; labelled texts; learning; machine learning; noun extracting system; text categorization; training set; Content based retrieval; Euclidean distance; Indexing; Information retrieval; Machine learning; Machine learning algorithms; Nearest neighbor searches; Routing; Testing; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on
  • Print_ISBN
    981-04-7524-1
  • Type

    conf

  • DOI
    10.1109/ICONIP.2002.1198154
  • Filename
    1198154