DocumentCode :
3723121
Title :
A Combined Approach for Filter Feature Selection in Document Classification
Author :
Le Nguyen Hoai Nam;Ho Bao Quoc
Author_Institution :
Sch. of Inf. Technol., VNUHCM - Univ. of Sci., Ho Chi Minh City, Vietnam
fYear :
2015
Firstpage :
317
Lastpage :
324
Abstract :
For a large set of documents, bag-of-words vector can reach thousands of features. Document classification faces many difficulties in high dimensionality of bag-of-words vector. High dimensionality not only increases computation cost but also degrades the accuracy of classification process. The aim of filter feature selection is to remove irrelevant features by selecting a subset of the original feature set. In this paper, we analyze two filter feature selection approaches which are the frequency-based approach and the cluster-based approach. We propose a hybrid filter Feature Selection method for the combination of these approaches, named FCFS, in order to exploit their strong points. We experiment on FCFS and related filter feature selection methods as CMFS, OCFS, CIIC, IG, CHI with two datasets about news and medicine. Regarding Macro-F1, FCFS is superior to the other methods, while FCFS shows comparable and even better performance than the other methods in term of Micro-F1
Keywords :
"Filtering algorithms","Training","Classification algorithms","Information filters","Iron","Filtering theory"
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2015 IEEE 27th International Conference on
ISSN :
1082-3409
Type :
conf
DOI :
10.1109/ICTAI.2015.56
Filename :
7372152
Link To Document :
بازگشت