• DocumentCode
    2861367
  • Title

    An Empirical Study of Feature Selection for Text Categorization based on Term Weightage

  • Author

    How, Bong Chih ; Narayanan, K.

  • Author_Institution
    Universiti Malaysia Sarawak
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    599
  • Lastpage
    602
  • Abstract
    This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five well-known feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics.
  • Keywords
    Computer science; Dictionaries; Frequency; Gain measurement; Indexing; Information retrieval; Information technology; Natural languages; Performance evaluation; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10060
  • Filename
    1410876