DocumentCode
2861367
Title
An Empirical Study of Feature Selection for Text Categorization based on Term Weightage
Author
How, Bong Chih ; Narayanan, K.
Author_Institution
Universiti Malaysia Sarawak
fYear
2004
fDate
20-24 Sept. 2004
Firstpage
599
Lastpage
602
Abstract
This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five well-known feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics.
Keywords
Computer science; Dictionaries; Frequency; Gain measurement; Indexing; Information retrieval; Information technology; Natural languages; Performance evaluation; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2100-2
Type
conf
DOI
10.1109/WI.2004.10060
Filename
1410876
Link To Document