DocumentCode :
2735808
Title :
An Aggressive Feature Selection Method based on Rough Set Theory
Author :
Li, Fangtao ; Guan, Tao ; Zhang, Xian ; Zhu, Xiaoyan
Author_Institution :
Tsinghua Univ., Beijing
fYear :
2007
fDate :
5-7 Sept. 2007
Firstpage :
176
Lastpage :
176
Abstract :
Abstract Feature selection is an important component of text classification to reduce the data dimensionality. In this paper, we optimize the Johnson´s Heuristic algorithm for rough set reduction, and then propose an aggressive feature selection method for text categorization. This method integrates the advantages of knowledge reduction in rough set (RS) theory and the conventional feature selection methods information gain (IG) and document frequency (DF). It is the first time that the rough set based feature selection method is experimented on the large-scale data set Reuters. And the results show that the proposed method can obtain higher categorization accuracy than IG and DF with much fewer features. In addition, comparing with the original rough set reduction, the proposed method reduces the computational time significantly. For the Reuters dataset, several discretization widths are adopted, and with our method, the quantities of features are reduced by 93.5%, 88.4% with only 0.61%, 0.13% decreases of F1 measure respectively.
Keywords :
feature extraction; rough set theory; text analysis; aggressive feature selection method; document frequency; information gain; rough set theory; text classification; Computer science; Frequency; Heuristic algorithms; Intelligent systems; Laboratories; Large-scale systems; Set theory; Statistical distributions; Statistical learning; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Computing, Information and Control, 2007. ICICIC '07. Second International Conference on
Conference_Location :
Kumamoto
Print_ISBN :
0-7695-2882-1
Type :
conf
DOI :
10.1109/ICICIC.2007.125
Filename :
4427821
Link To Document :
بازگشت