Title :
Name Disambiguation Using Atomic Clusters
Author :
Wang, Feng ; Li, Juanzi ; Tang, Jie ; Zhang, Jing ; Wang, Kehong
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing
Abstract :
Name ambiguity is a critical problem in many applications, in particular in the online bibliography systems, such as DBLP and CiteSeer. Previously, several clustering based methods have been proposed although, the problem still presents to be a big challenge for both research and industry communities. In this paper, we present a complementary study to the problem from another point of view. We propose an approach of finding atomic clusters to improve the performance of existing clustering-based methods. We conducted experiments on a dataset from a real-world system: Arnetminer.org. Experiments results show that significant improvements can be obtained by using the proposed atomic clusters finding approach (about +8% and +27% improvements depending on different clustering methods).
Keywords :
bibliographic systems; pattern clustering; text analysis; Arnetminer.org; CiteSeer; DBLP; atomic cluster finding; clustering-based method; name disambiguation; online bibliography systems; Application software; Atomic measurements; Bibliographies; Clustering algorithms; Clustering methods; Computer science; Hidden Markov models; Information management; Proposals; atomic cluster; name disambiguation;
Conference_Titel :
Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
Conference_Location :
Zhangjiajie Hunan
Print_ISBN :
978-0-7695-3185-4
Electronic_ISBN :
978-0-7695-3185-4
DOI :
10.1109/WAIM.2008.96