DocumentCode :
3722747
Title :
A Hybrid Feature Selection Method for Vietnamese Text Classification
Author :
Nguyen Tri Hai;Nguyen Hoang Nghia;Tuan Dinh Le;Vu Thanh Nguyen
Author_Institution :
Univ. of Inf. Technol., Ho Chi Minh City, Vietnam
fYear :
2015
Firstpage :
91
Lastpage :
96
Abstract :
Text classification is a very important task due to the huge amount of electronic documents. One of the main challenges for text classification is the high dimensionality of feature spaces. There have been extensive studies on feature selections for English text classification. However, not many works have been studied on Vietnamese text classification. This paper evaluates the performances of the three widely used feature selection methods [2][6][10]: the Chi-square (CHI), the Information Gain (IG), and the Document Frequency (DF). Based on the evaluation, we propose a hybrid feature selection method, called SIGCHI, which combines the Chi-square and the Information Gain feature selection methods. Our experimental results showed that the proposed method performs significantly better than the other methods. The accuracy of SIGCHI method is up to 15.03% higher than the one of CHI method, up to 18.65% higher than the one of IG method, and up to 27.72% higher than the one of DF method, respectively.
Keywords :
"Text categorization","Support vector machines","Training","Electronic mail","Feature extraction","Information technology","Cities and towns"
Publisher :
ieee
Conference_Titel :
Knowledge and Systems Engineering (KSE), 2015 Seventh International Conference on
Type :
conf
DOI :
10.1109/KSE.2015.25
Filename :
7371764
Link To Document :
بازگشت