DocumentCode
2184596
Title
Developing an effective Thai Document Categorization Framework base on term relevance frequency weighting
Author
Chirawichitchai, Nivet ; Sa-nguansat, Parinya ; Meesad, Phayung
Author_Institution
Dept. of Inf. Technol., King Mongkut´´s Univ. of Technol., Bangkok, Thailand
fYear
2010
fDate
24-25 Nov. 2010
Firstpage
19
Lastpage
23
Abstract
Text Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is an important preprocessing technique in text categorization. In this paper, we purpose Thai Document Categorization Framework focusing on the comparison of various term weighting schemes, including Boolean, tf, tf-idf, tfc, ltc entropy and tf-rf weighting. We have evaluated these methods on Thai news article corpus with three supervised learning classifiers. We found tf-rf weighting most effective in our experiments with SVM NB and DT algorithms. Based on our experiments, using tf-rf weighting with SVM algorithm yielded the best performance with the F-measure equaling 95.9%.
Keywords
learning (artificial intelligence); support vector machines; text analysis; Boolean weighting; SVM algorithm; Thai document categorization framework; ltc entropy weighting; supervised learning classifiers; term relevance frequency weighting; text categorization; tf weighting; tf-idf weighting; tf-rf weighting; tfc weighting; Classification algorithms; Entropy; Machine learning; Niobium; Support vector machines; Text categorization; Training; Supervised Learning; Term weighting; Text Categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Knowledge Engineering, 2010 8th International Conference on ICT and
Conference_Location
Bangkok
ISSN
2157-0981
Print_ISBN
978-1-4244-9874-1
Electronic_ISBN
2157-0981
Type
conf
DOI
10.1109/ICTKE.2010.5692907
Filename
5692907
Link To Document