DocumentCode :
2371195
Title :
A feature selection framework for text filtering
Author :
Zheng, Zhaohui ; Srihari, Rohini ; Srihari, Sargur
Author_Institution :
CEDAR, State Univ. of New York, Buffalo, NY, USA
fYear :
2003
fDate :
19-22 Nov. 2003
Firstpage :
705
Lastpage :
708
Abstract :
We present a new framework for local feature selection in text filtering. In this framework, a feature set is constructed per category by first selecting a set of terms highly indicative of membership (positive set) and another set of terms highly indicative of nonmembership (negative set), and then combining these two sets. This feature selection framework not only unifies several standard feature selection methods, but also facilitates the proposal of a new method that optimally combines the positive and negative sets. The experimental comparison between the proposed method and standard methods was conducted on six feature selection metrics: chi-square, correlation coefficient, odds ratio, GSS coefficient and two proposed variants of odds ratio and GSS coefficient: OR-square and GSS-square respectively. The results show that the proposed feature selection method improves text filtering performance.
Keywords :
correlation methods; feature extraction; statistical analysis; text analysis; GSS coefficient; chi-square metric; correlation coefficient; data mining; feature selection method; feature set; text filtering; Chromium; Computer science; Data mining; Feedback; Frequency measurement; Gain measurement; Information filtering; Information filters; Mutual information; Proposals;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
Type :
conf
DOI :
10.1109/ICDM.2003.1251013
Filename :
1251013
Link To Document :
بازگشت