• DocumentCode
    1974336
  • Title

    Improvement and Application of TF•IDF Method Based on Text Classification

  • Author

    Kuang, Qiaoyan ; Xu, Xiaoming

  • Author_Institution
    Comput. Dept., Hunan Int. Econ. Univ., Changsha, China
  • fYear
    2010
  • fDate
    20-22 Aug. 2010
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Feature extraction is the important prerequisite of classifying text effectively and automatically. TF·IDF is widely used to express the text feature weight. But it has some problems. TF·IDF can´t reflect the distribution of terms in the text, and then can´t reflect the importance degree and the difference between categories. This paper proposes a new feature weighting method-TF·IDF·Ci to which a new weight Ci is added to express the differences between classes on the base of original TF·IDF. After combining TF·IDF·Ci and specific classification algorithm, it always get a larger macro F1 value than of TF·IDF. Meanwhile, the standard deviation of the classification index of the TF·IDF·Ci is much smaller than that of TF·IDF. That shows TF·IDF·Ci not only improve the classification precision but also decreases the sensitivity towards feature dimensions to some extent.
  • Keywords
    feature extraction; text analysis; TF·IDF method; TF·IDF·Ci method; feature extraction; feature weighting method; text classification; Classification algorithms; Computers; Economics; Feature extraction; Sensitivity; Support vector machine classification; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Internet Technology and Applications, 2010 International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-5142-5
  • Electronic_ISBN
    978-1-4244-5143-2
  • Type

    conf

  • DOI
    10.1109/ITAPP.2010.5566113
  • Filename
    5566113