• DocumentCode
    3301555
  • Title

    A novel feature weight algorithm for text categorization

  • Author

    Shang, Wenqian ; Dong, Hongbin ; Zhu, Haibin ; Wang, Yongbin

  • Author_Institution
    Sch. of Comput., Commun. Univ. of China, Beijing
  • fYear
    2008
  • fDate
    19-22 Oct. 2008
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    With the development of the Web, large numbers of documents are put onto the Internet. More and more digital libraries, news sources and inner data of companies are available. Automatic text categorization becomes more and more important for dealing with massive data. However, text preprocessing is still the bottleneck of text categorization based on vector space model (VSM). The result of text preprocessing directly affects the performance and precision of categorization. Moreover, feature selection and feature weight become the major obstacles of text preprocessing. In this paper, we mainly focus on feature weight. We present a novel feature weight algorithm----TF-Gini that can improve the categorization performance significantly. The experiment results verify the effectiveness of this algorithm.
  • Keywords
    Internet; text analysis; vectors; Internet; TF-Gini; World Wide Web; feature weight algorithm; text categorization; text preprocessing; vector space model; Acoustic noise; Computer science; Electronic mail; Entropy; Frequency; Internet; Software libraries; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-4515-8
  • Electronic_ISBN
    978-1-4244-2780-2
  • Type

    conf

  • DOI
    10.1109/NLPKE.2008.4906817
  • Filename
    4906817