DocumentCode
3767540
Title
Weighted Document Frequency for feature selection in text classification
Author
Baoli Li;Qiuling Yan;Zhenqiang Xu;Guicai Wang
Author_Institution
College of Information Science and Engineering, Henan University of Technology, Zhengzhou, CHINA
fYear
2015
Firstpage
132
Lastpage
135
Abstract
In the past research, Document Frequency (DF) has been validated to be a simple yet quite effective measure for feature selection in text classification. The calculation is based on how many documents in a collection contain a feature, which can be a word, a phrase, a n-gram, or a specially derived attribute. The counting process takes a binary strategy: if a feature appears in a document, its DF will be increased by one. This traditional DF metric concerns only about whether a feature appears in a document, but does not consider how important the feature is in that document. Obviously, thus counted document frequency is very likely to introduce much noise. Therefore, a weighted document frequency (WDF) is proposed and expected to reduce such noise to some extent. Extensive experiments on two text classification datasets demonstrate the effectiveness of the proposed measure.
Keywords
"Text recognition","Standards","Data collection","Silicon","Irrigation","Local area networks"
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2015 International Conference on
Print_ISBN
978-1-4673-9595-3
Type
conf
DOI
10.1109/IALP.2015.7451549
Filename
7451549
Link To Document