DocumentCode :
684818
Title :
An improved Global Weight Function of Terms based on Pearson´s Chi-square statistics
Author :
Tian Xia ; Yanmei Chai ; Hong Lu ; Tong Wang
Author_Institution :
Dept. of Comput. & Inf. Sci., Shanghai Second Polytech. Univ., Shanghai, China
fYear :
2012
fDate :
7-9 Dec. 2012
Firstpage :
1
Lastpage :
5
Abstract :
Since term frequency, the most popular discriminator used in term weighting of Natural language processing (NLP), is not the only one which is necessary to be considered when calculating the term weight and make it suitable to indicate term importance, we are motivated to investigate other statistical characteristics of terms and found an important discriminator: term distribution. It is found in this this paper that a term close to hypo-dispersion distribution usually contains much contextual information and should be given higher weight than the one close to intensive distribution. Based on this hypothesis, a Pearson´s Chi-square Theory based Term Global Weight Function is put forward in this paper. In addition, a text classifier system is developed based on LSA (Latent Semantic Analysis) model and its precision and recall results are used for evaluation, which approve the reliability and efficiency of the algorithm On conclusion, term distribution should be considered into term weighting as a new discriminator and the algorithms in this paper is recommended.
Keywords :
natural language processing; pattern classification; statistics; text analysis; LSA model; NLP; Pearson Chi-square theory based term global weight function; Pearson chi-square statistics; hypo-dispersion distribution; latent semantic analysis model; natural language processing; precision and recall; term distribution; term frequency; term statistical characteristics; term weighting; text classifier system; IDF; Latent Semantic Analysis; Natural Language Processing; Person´s Chi-square; Term Weight;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Information Science and Control Engineering 2012 (ICISCE 2012), IET International Conference on
Conference_Location :
Shenzhen
Electronic_ISBN :
978-1-84919-641-3
Type :
conf
DOI :
10.1049/cp.2012.2404
Filename :
6755783
Link To Document :
بازگشت