DocumentCode :
1854158
Title :
Study on Feature Selection and Weighting Based on Synonym Merge in Text Categorization
Author :
Lu, Zhenyu ; Yongmin Liu ; Zhao, Shuang ; Chen, Xuebin
Author_Institution :
Coll. of Econ. & Manage., Hebei Polytech. Univ., Tangshan, China
fYear :
2010
fDate :
22-24 Jan. 2010
Firstpage :
105
Lastpage :
109
Abstract :
Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight function based on term frequency and entropy, which adjusts the effect of the feature term in the classifier according to the feature term´s strength. Experiments show that our method is much better than kinds of traditional feature selection methods and it improve the performance of text categorization systems.
Keywords :
natural language processing; pattern classification; text analysis; classifier; feature weighting; synonym merge; text categorization; text feature selection; Conference management; Educational institutions; Electronic mail; Entropy; Frequency; Information retrieval; Statistics; Text categorization; Thesauri; Vocabulary; TongYiCi CiLin; entropy; feature selection; feature weighting; synonym merge; text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Future Networks, 2010. ICFN '10. Second International Conference on
Conference_Location :
Sanya, Hainan
Print_ISBN :
978-0-7695-3940-9
Electronic_ISBN :
978-1-4244-5667-3
Type :
conf
DOI :
10.1109/ICFN.2010.70
Filename :
5431872
Link To Document :
بازگشت