DocumentCode
1815250
Title
Comparative study of name disambiguation problem using a scalable blocking-based framework
Author
On, ByungWon ; Kang, Jaewoo ; Lee, Dongwon ; Mitra, Prasenjit
Author_Institution
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA
fYear
2005
fDate
7-11 June 2005
Firstpage
344
Lastpage
353
Abstract
In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g., "Vannevar Bush" and "V. Vush"). Our study is based on a scalable two-step framework, where step 1 is to substantially reduce the number of candidates via blocking, and step 2 is to measure the distance of two names via coauthor information. Combining four blocking methods and seven distance measures on four data sets, we present extensive experimental results, and identify combinations that are scalable and effective to disambiguate author names in citations
Keywords
bibliographic systems; citation analysis; ambiguous author names; bibliographic citations; blocking methods; data sets; name disambiguation problem; scalable blocking-based framework; Books; Computer science; Error correction; Information retrieval; Information systems; Large-scale systems; Partitioning algorithms; Permission; Portals; Software libraries; blocking; measuring distances; name disambiguation;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Libraries, 2005. JCDL '05. Proceedings of the 5th ACM/IEEE-CS Joint Conference on
Conference_Location
Denver, CO
Print_ISBN
1-58113-876-8
Type
conf
DOI
10.1145/1065385.1065463
Filename
4118564
Link To Document