• DocumentCode
    1300611
  • Title

    An Unsupervised Approach for Person Name Bipolarization Using Principal Component Analysis

  • Author

    Chen, Chien Chin ; Chen, Zhong-Yong ; Wu, Chen-Yuan

  • Author_Institution
    Dept. of Inf. Manage., Nat. Taiwan Univ., Taipei, Taiwan
  • Volume
    24
  • Issue
    11
  • fYear
    2012
  • Firstpage
    1963
  • Lastpage
    1976
  • Abstract
    A topic is usually associated with a specific time, place, and person(s). Generally, topics that involve bipolar or competing viewpoints are attention getting and are thus reported in a large number of documents. Identifying the association between important persons mentioned in numerous topic documents would help readers comprehend topics more easily. In this paper, we propose an unsupervised approach for identifying bipolar person names in a set of topic documents. Specifically, we employ principal component analysis (PCA) to discover bipolar word usage patterns of person names in the documents, and show that the signs of the entries in the principal eigenvector of PCA partition the person names into bipolar groups spontaneously. To reduce the effect of data sparseness, we introduce two techniques, called the weighted correlation coefficient and off-topic block elimination. We also present a timeline system that shows the intensity and activeness development of the identified bipolar person groups. Empirical evaluations demonstrate the efficacy of the proposed approach in identifying bipolar person names in topic documents, while the generated timelines provide comprehensive storylines of topics.
  • Keywords
    correlation theory; data mining; document handling; eigenvalues and eigenfunctions; principal component analysis; word processing; PCA; bipolar person name identification; bipolar word usage pattern discovery; data sparseness; off-topic block elimination; principal component analysis; principal eigenvector; timeline system; topic document; unsupervised approach; weighted correlation coefficient; Correlation; Hidden Markov models; Internet; Matrix decomposition; Principal component analysis; Symmetric matrices; Web pages; Topic mining; bipolar timeline; sentiment analysis;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2011.177
  • Filename
    5989806