• DocumentCode
    1815250
  • Title

    Comparative study of name disambiguation problem using a scalable blocking-based framework

  • Author

    On, ByungWon ; Kang, Jaewoo ; Lee, Dongwon ; Mitra, Prasenjit

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA
  • fYear
    2005
  • fDate
    7-11 June 2005
  • Firstpage
    344
  • Lastpage
    353
  • Abstract
    In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g., "Vannevar Bush" and "V. Vush"). Our study is based on a scalable two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present extensive experimental results, and identify combinations that are scalable and effective to disambiguate author names in citations
  • Keywords
    bibliographic systems; citation analysis; ambiguous author names; bibliographic citations; blocking methods; data sets; name disambiguation problem; scalable blocking-based framework; Books; Computer science; Error correction; Information retrieval; Information systems; Large-scale systems; Partitioning algorithms; Permission; Portals; Software libraries; blocking; measuring distances; name disambiguation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 2005. JCDL '05. Proceedings of the 5th ACM/IEEE-CS Joint Conference on
  • Conference_Location
    Denver, CO
  • Print_ISBN
    1-58113-876-8
  • Type

    conf

  • DOI
    10.1145/1065385.1065463
  • Filename
    4118564