• DocumentCode
    1896376
  • Title

    A Principle Component Analysis Based Method to Normalize Term Weights

  • Author

    Xia, Tian ; Chai, Yanmei

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Shanghai Second Polytech. Univ., Shanghai, China
  • fYear
    2010
  • fDate
    25-26 Dec. 2010
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Term Weighting is a significant step in Document formalization in Natural Language Processing. It greatly interferes the accuracy of natural language processing systems. Term weight consists of three parts: Global Term Weight, Local Term Weight and standardization factor. Many term weight algorithms have been presented to address each part. And currently, the final term weight is the product of multiple term weight algorithms. However, the results of different term weight algorithms are correlated to each other, which indicates the redundant overlapped information between them. Simply multiplying the results leads to inaccurate final term weighting. This paper puts forward a Principle Component Analysis based Term Weights Normalizing Method, which is able to remove the redundant overlapped information and come up with a more accurate final term weight.
  • Keywords
    document handling; natural language processing; principal component analysis; document formalization; global term weight; local term weight; natural language processing; principle component analysis; standardization factor; term weights normalizing method; Algorithm design and analysis; Correlation; Covariance matrix; Equations; Mathematical model; Natural language processing; Principal component analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Engineering and Computer Science (ICIECS), 2010 2nd International Conference on
  • Conference_Location
    Wuhan
  • ISSN
    2156-7379
  • Print_ISBN
    978-1-4244-7939-9
  • Electronic_ISBN
    2156-7379
  • Type

    conf

  • DOI
    10.1109/ICIECS.2010.5678139
  • Filename
    5678139