Title :
Model Selection Strategies for Author Disambiguation
Author :
Kern, Roman ; Zechner, Mario ; Granitzer, Michael
Author_Institution :
Inst. of Knowledge Manage., Graz Univ. of Technol., Graz, Austria
fDate :
Aug. 29 2011-Sept. 2 2011
Abstract :
Author disambiguation is a prerequisite for utilizing bibliographic metadata in citation analysis. Automatic disambiguation algorithms mostly rely on cluster-based disambiguation strategies for identifying unique authors given their names and publications. However, most approaches rely on knowing the correct number of unique authors a-priori, which is rarely the case in real world settings. In this publication we analyse cluster-based disambiguation strategies and develop a model selection method to estimate the number of distinct authors based on co-authorship networks. We show that, given clean textual features, the developed model selection method provides accurate guesses of the number of unique authors.
Keywords :
bibliographic systems; citation analysis; meta data; author disambiguation; automatic disambiguation algorithms; bibliographic metadata; citation analysis; cluster-based disambiguation strategies; co-authorship networks; model selection strategies; textual features; Clustering algorithms; Entropy; Feature extraction; Joining processes; Partitioning algorithms; Probability; Web search; author disambiguation; model selection;
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2011 22nd International Workshop on
Conference_Location :
Toulouse
Print_ISBN :
978-1-4577-0982-1
DOI :
10.1109/DEXA.2011.54