DocumentCode :
3462109
Title :
Two supervised learning approaches for name disambiguation in author citations
Author :
Han, Hui ; Giles, Lee ; Zha, Hongyuan ; Li, Cheng ; Tsioutsiouliklis, Kostas
Author_Institution :
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
fYear :
2004
fDate :
7-11 June 2004
Firstpage :
296
Lastpage :
305
Abstract :
Due to name abbreviations, identical names, name misspellings, and pseudonyms in publications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, Web search, database integration, and may cause improper attribution to authors. We investigate two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses support vector machines (SVMs) [V. Vapnik (1995)] and the vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: coauthor names, the title of the paper, and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the Web, mainly publication lists from homepages, the other collected from the DBLP citation databases.
Keywords :
Bayes methods; Internet; bibliographic systems; citation analysis; data integrity; information retrieval; learning (artificial intelligence); probability; search engines; support vector machines; DBLP citation databases; Web publication lists; Web search; author citations; bibliographies; citation vector space representation; coauthor names; database integration; discriminative model; document retrieval; homepages; identical names; journal title; naive Bayes probability model; name ambiguity; name disambiguation; name misspellings; publication pseudonyms; supervised learning approach; support vector machines; Bibliographies; Computer science; Databases; Information retrieval; Permission; Public healthcare; Software libraries; Statistics; Supervised learning; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on
Print_ISBN :
1-58113-832-6
Type :
conf
DOI :
10.1109/JCDL.2004.1336139
Filename :
1336139
Link To Document :
بازگشت