Title :
A feature selection framework for text filtering
Author :
Zheng, Zhaohui ; Srihari, Rohini ; Srihari, Sargur
Author_Institution :
CEDAR, State Univ. of New York, Buffalo, NY, USA
Abstract :
We present a new framework for local feature selection in text filtering. In this framework, a feature set is constructed per category by first selecting a set of terms highly indicative of membership (positive set) and another set of terms highly indicative of nonmembership (negative set), and then combining these two sets. This feature selection framework not only unifies several standard feature selection methods, but also facilitates the proposal of a new method that optimally combines the positive and negative sets. The experimental comparison between the proposed method and standard methods was conducted on six feature selection metrics: chi-square, correlation coefficient, odds ratio, GSS coefficient and two proposed variants of odds ratio and GSS coefficient: OR-square and GSS-square respectively. The results show that the proposed feature selection method improves text filtering performance.
Keywords :
correlation methods; feature extraction; statistical analysis; text analysis; GSS coefficient; chi-square metric; correlation coefficient; data mining; feature selection method; feature set; text filtering; Chromium; Computer science; Data mining; Feedback; Frequency measurement; Gain measurement; Information filtering; Information filters; Mutual information; Proposals;
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
DOI :
10.1109/ICDM.2003.1251013