• DocumentCode
    2184596
  • Title

    Developing an effective Thai Document Categorization Framework base on term relevance frequency weighting

  • Author

    Chirawichitchai, Nivet ; Sa-nguansat, Parinya ; Meesad, Phayung

  • Author_Institution
    Dept. of Inf. Technol., King Mongkut´´s Univ. of Technol., Bangkok, Thailand
  • fYear
    2010
  • fDate
    24-25 Nov. 2010
  • Firstpage
    19
  • Lastpage
    23
  • Abstract
    Text Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is an important preprocessing technique in text categorization. In this paper, we purpose Thai Document Categorization Framework focusing on the comparison of various term weighting schemes, including Boolean, tf, tf-idf, tfc, ltc entropy and tf-rf weighting. We have evaluated these methods on Thai news article corpus with three supervised learning classifiers. We found tf-rf weighting most effective in our experiments with SVM NB and DT algorithms. Based on our experiments, using tf-rf weighting with SVM algorithm yielded the best performance with the F-measure equaling 95.9%.
  • Keywords
    learning (artificial intelligence); support vector machines; text analysis; Boolean weighting; SVM algorithm; Thai document categorization framework; ltc entropy weighting; supervised learning classifiers; term relevance frequency weighting; text categorization; tf weighting; tf-idf weighting; tf-rf weighting; tfc weighting; Classification algorithms; Entropy; Machine learning; Niobium; Support vector machines; Text categorization; Training; Supervised Learning; Term weighting; Text Categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge Engineering, 2010 8th International Conference on ICT and
  • Conference_Location
    Bangkok
  • ISSN
    2157-0981
  • Print_ISBN
    978-1-4244-9874-1
  • Electronic_ISBN
    2157-0981
  • Type

    conf

  • DOI
    10.1109/ICTKE.2010.5692907
  • Filename
    5692907