DocumentCode
1854158
Title
Study on Feature Selection and Weighting Based on Synonym Merge in Text Categorization
Author
Lu, Zhenyu ; Yongmin Liu ; Zhao, Shuang ; Chen, Xuebin
Author_Institution
Coll. of Econ. & Manage., Hebei Polytech. Univ., Tangshan, China
fYear
2010
fDate
22-24 Jan. 2010
Firstpage
105
Lastpage
109
Abstract
Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight function based on term frequency and entropy, which adjusts the effect of the feature term in the classifier according to the feature term´s strength. Experiments show that our method is much better than kinds of traditional feature selection methods and it improve the performance of text categorization systems.
Keywords
natural language processing; pattern classification; text analysis; classifier; feature weighting; synonym merge; text categorization; text feature selection; Conference management; Educational institutions; Electronic mail; Entropy; Frequency; Information retrieval; Statistics; Text categorization; Thesauri; Vocabulary; TongYiCi CiLin; entropy; feature selection; feature weighting; synonym merge; text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Future Networks, 2010. ICFN '10. Second International Conference on
Conference_Location
Sanya, Hainan
Print_ISBN
978-0-7695-3940-9
Electronic_ISBN
978-1-4244-5667-3
Type
conf
DOI
10.1109/ICFN.2010.70
Filename
5431872
Link To Document