DocumentCode :
1126056
Title :
Introducing a family of linear measures for feature selection in text categorization
Author :
Combarro, Elís F. ; Montañés, Elena ; Díaz, Irene ; Ranilla, José ; Mones, Ricardo
Author_Institution :
Artificial Intelligence Center, Oviedo Univ., Gijon, Spain
Volume :
17
Issue :
9
fYear :
2005
Firstpage :
1223
Lastpage :
1232
Abstract :
Text categorization, which consists of automatically assigning documents to a set of categories, usually involves the management of a huge number of features. Most of them are irrelevant and others introduce noise which could mislead the classifiers. Thus, feature reduction is often performed in order to increase the efficiency and effectiveness of the classification. In this paper, we propose to select relevant features by means of a family of linear filtering measures which are simpler than the usual measures applied for this purpose. We carry out experiments over two different corpora and find that the proposed measures perform better than the existing ones.
Keywords :
classification; feature extraction; information filtering; learning (artificial intelligence); pattern classification; text analysis; document classification; feature reduction; feature selection; linear filtering measures; machine learning; text categorization; Availability; Filtering; Frequency; Humans; Machine learning; Maximum likelihood detection; Nonlinear filters; Performance evaluation; Text categorization; Wrapping; Index Terms- Text categorization; feature selection; filtering measures; machine learning.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2005.149
Filename :
1490529
Link To Document :
بازگشت