• DocumentCode
    2199226
  • Title

    Improved Terms Weighting Algorithm of Text

  • Author

    Ma Zhanguo ; Feng Jing ; Hu Xiangyi ; Shi Yanqin ; Chen Liang

  • Author_Institution
    Dept. of Inf. Technol., Beijing Sci. & Technol. Inf. Inst., Beijing, China
  • Volume
    2
  • fYear
    2011
  • fDate
    14-15 May 2011
  • Firstpage
    367
  • Lastpage
    370
  • Abstract
    Most of traditional information retrieval and automatic text classification methods with vector space model almost need determine the weighting of the feature terms. Term weighting plays an important role to achieve high performance in information retrieval and text classification. The popular method is using term frequency (tf) and inverse document frequency (idf) for representing importance and computing weighting of terms. But the tf-idf model is not introduced class information, the important information such as title, abstract, conclusion, and the synonymous words information. This paper provides an improved method to compute weighting of the terms. The above information is involved. The experimental results show that the performance is enhanced. The role of important and representative terms is raised and the effect of the unimportant feature term to retrieval and classification is decreased. In addition, the F1 based on new algorithm is higher than based on traditional tf-idf model.
  • Keywords
    information retrieval; pattern classification; text analysis; improved terms weighting algorithm; information retrieval; inverse document frequency; term frequency; text classification; tf-idf model; vector space model; Classification algorithms; Computers; Equations; Information retrieval; Mathematical model; Support vector machine classification; Text categorization; information tetrieval; term weighting; text classification; vector space model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Computing and Information Security (NCIS), 2011 International Conference on
  • Conference_Location
    Guilin
  • Print_ISBN
    978-1-61284-347-6
  • Type

    conf

  • DOI
    10.1109/NCIS.2011.171
  • Filename
    5948854