• DocumentCode
    2043122
  • Title

    Multi-document Chinese name disambiguation based on Latent Semantic Analysis

  • Author

    Wu, Chengrong ; Gong, Linghui ; Zeng, Jianping

  • Author_Institution
    Sch. of Comput. Sci., Fudan Univ., Shanghai, China
  • Volume
    5
  • fYear
    2010
  • fDate
    10-12 Aug. 2010
  • Firstpage
    2367
  • Lastpage
    2371
  • Abstract
    Name disambiguation has received considerable attention as an important subtask of NLP (Natural Language Processing). Given many potential references of person entities, the goal is to find out for each reference involved in the context the most possible person entity it refers to. However, many researches in this field either focus on name disambiguation within a single text or employ machine learning models on multi-document without any consideration of semantics. In this paper we propose a new algorithm based on LSA (Latent Semantic Analysis) for the multi-document disambiguation task for Chinese name. The method employs SVD (Singular Value Decomposition) to reduce the original high dimensional text space to comparatively lower dimensional semantic space and then cluster possible reference words on the semantic space to get the result. Experiments on a real world dataset which is collected from a BBS site show that the proposed method can generate reasonable result.
  • Keywords
    learning (artificial intelligence); natural language processing; text analysis; BBS site; high dimensional text space; latent semantic analysis; machine learning model; multidocument Chinese name disambiguation; natural language processing; singular value decomposition; Algorithm design and analysis; Clustering algorithms; Computational linguistics; Context; Machine learning algorithms; Semantics; Tagging; LSA; SVD; name disambiguation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
  • Conference_Location
    Yantai, Shandong
  • Print_ISBN
    978-1-4244-5931-5
  • Type

    conf

  • DOI
    10.1109/FSKD.2010.5569867
  • Filename
    5569867