DocumentCode
1896376
Title
A Principle Component Analysis Based Method to Normalize Term Weights
Author
Xia, Tian ; Chai, Yanmei
Author_Institution
Dept. of Comput. & Inf. Sci., Shanghai Second Polytech. Univ., Shanghai, China
fYear
2010
fDate
25-26 Dec. 2010
Firstpage
1
Lastpage
4
Abstract
Term Weighting is a significant step in Document formalization in Natural Language Processing. It greatly interferes the accuracy of natural language processing systems. Term weight consists of three parts: Global Term Weight, Local Term Weight and standardization factor. Many term weight algorithms have been presented to address each part. And currently, the final term weight is the product of multiple term weight algorithms. However, the results of different term weight algorithms are correlated to each other, which indicates the redundant overlapped information between them. Simply multiplying the results leads to inaccurate final term weighting. This paper puts forward a Principle Component Analysis based Term Weights Normalizing Method, which is able to remove the redundant overlapped information and come up with a more accurate final term weight.
Keywords
document handling; natural language processing; principal component analysis; document formalization; global term weight; local term weight; natural language processing; principle component analysis; standardization factor; term weights normalizing method; Algorithm design and analysis; Correlation; Covariance matrix; Equations; Mathematical model; Natural language processing; Principal component analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Engineering and Computer Science (ICIECS), 2010 2nd International Conference on
Conference_Location
Wuhan
ISSN
2156-7379
Print_ISBN
978-1-4244-7939-9
Electronic_ISBN
2156-7379
Type
conf
DOI
10.1109/ICIECS.2010.5678139
Filename
5678139
Link To Document