• DocumentCode
    1854158
  • Title

    Study on Feature Selection and Weighting Based on Synonym Merge in Text Categorization

  • Author

    Lu, Zhenyu ; Yongmin Liu ; Zhao, Shuang ; Chen, Xuebin

  • Author_Institution
    Coll. of Econ. & Manage., Hebei Polytech. Univ., Tangshan, China
  • fYear
    2010
  • fDate
    22-24 Jan. 2010
  • Firstpage
    105
  • Lastpage
    109
  • Abstract
    Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight function based on term frequency and entropy, which adjusts the effect of the feature term in the classifier according to the feature term´s strength. Experiments show that our method is much better than kinds of traditional feature selection methods and it improve the performance of text categorization systems.
  • Keywords
    natural language processing; pattern classification; text analysis; classifier; feature weighting; synonym merge; text categorization; text feature selection; Conference management; Educational institutions; Electronic mail; Entropy; Frequency; Information retrieval; Statistics; Text categorization; Thesauri; Vocabulary; TongYiCi CiLin; entropy; feature selection; feature weighting; synonym merge; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Future Networks, 2010. ICFN '10. Second International Conference on
  • Conference_Location
    Sanya, Hainan
  • Print_ISBN
    978-0-7695-3940-9
  • Electronic_ISBN
    978-1-4244-5667-3
  • Type

    conf

  • DOI
    10.1109/ICFN.2010.70
  • Filename
    5431872