• DocumentCode
    2727575
  • Title

    Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient

  • Author

    Li-Ju Gao ; Been-Chian Chien

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Univ. of Tainan, Tainan, Taiwan
  • fYear
    2012
  • fDate
    16-18 Nov. 2012
  • Firstpage
    137
  • Lastpage
    142
  • Abstract
    Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method indiscriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.
  • Keywords
    feature extraction; pattern classification; pattern clustering; statistical analysis; text analysis; F1 measurements; cluster-based discriminant coefficient; clustering mechanism; discriminant based feature reduction method; discriminant coefficient; document analysis method; electronic documents; high dimensional keywords; high-precision text classification; text categorization; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Clustering methods; Feature extraction; Matrix converters; Text categorization; classification; discriminant coefficient; feature clustering; feature reduction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Technologies and Applications of Artificial Intelligence (TAAI), 2012 Conference on
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4673-4976-5
  • Type

    conf

  • DOI
    10.1109/TAAI.2012.16
  • Filename
    6395020