DocumentCode :
2199226
Title :
Improved Terms Weighting Algorithm of Text
Author :
Ma Zhanguo ; Feng Jing ; Hu Xiangyi ; Shi Yanqin ; Chen Liang
Author_Institution :
Dept. of Inf. Technol., Beijing Sci. & Technol. Inf. Inst., Beijing, China
Volume :
2
fYear :
2011
fDate :
14-15 May 2011
Firstpage :
367
Lastpage :
370
Abstract :
Most of traditional information retrieval and automatic text classification methods with vector space model almost need determine the weighting of the feature terms. Term weighting plays an important role to achieve high performance in information retrieval and text classification. The popular method is using term frequency (tf) and inverse document frequency (idf) for representing importance and computing weighting of terms. But the tf-idf model is not introduced class information, the important information such as title, abstract, conclusion, and the synonymous words information. This paper provides an improved method to compute weighting of the terms. The above information is involved. The experimental results show that the performance is enhanced. The role of important and representative terms is raised and the effect of the unimportant feature term to retrieval and classification is decreased. In addition, the F1 based on new algorithm is higher than based on traditional tf-idf model.
Keywords :
information retrieval; pattern classification; text analysis; improved terms weighting algorithm; information retrieval; inverse document frequency; term frequency; text classification; tf-idf model; vector space model; Classification algorithms; Computers; Equations; Information retrieval; Mathematical model; Support vector machine classification; Text categorization; information tetrieval; term weighting; text classification; vector space model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network Computing and Information Security (NCIS), 2011 International Conference on
Conference_Location :
Guilin
Print_ISBN :
978-1-61284-347-6
Type :
conf
DOI :
10.1109/NCIS.2011.171
Filename :
5948854
Link To Document :
بازگشت