Title :
Introducing a family of linear measures for feature selection in text categorization
Author :
Combarro, Elís F. ; Montañés, Elena ; Díaz, Irene ; Ranilla, José ; Mones, Ricardo
Author_Institution :
Artificial Intelligence Center, Oviedo Univ., Gijon, Spain
Abstract :
Text categorization, which consists of automatically assigning documents to a set of categories, usually involves the management of a huge number of features. Most of them are irrelevant and others introduce noise which could mislead the classifiers. Thus, feature reduction is often performed in order to increase the efficiency and effectiveness of the classification. In this paper, we propose to select relevant features by means of a family of linear filtering measures which are simpler than the usual measures applied for this purpose. We carry out experiments over two different corpora and find that the proposed measures perform better than the existing ones.
Keywords :
classification; feature extraction; information filtering; learning (artificial intelligence); pattern classification; text analysis; document classification; feature reduction; feature selection; linear filtering measures; machine learning; text categorization; Availability; Filtering; Frequency; Humans; Machine learning; Maximum likelihood detection; Nonlinear filters; Performance evaluation; Text categorization; Wrapping; Index Terms- Text categorization; feature selection; filtering measures; machine learning.;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2005.149