• DocumentCode
    3125523
  • Title

    ADANA: Active Name Disambiguation

  • Author

    Wang, Xuezhi ; Tang, Jie ; Cheng, Hong ; Yu, Philip S.

  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    794
  • Lastpage
    803
  • Abstract
    Name ambiguity has long been viewed as a challenging problem in many applications, such as scientific literature management, people search, and social network analysis. When we search a person name in these systems, many documents (e.g., papers, web pages) containing that person´s name may be returned. It is hard to determine which documents are about the person we care about. Although much research has been conducted, the problem remains largely unsolved, especially with the rapid growth of the people information available on the Web. In this paper, we try to study this problem from a new perspective and propose an ADANA method for disambiguating person names via active user interactions. In ADANA, we first introduce a pairwise factor graph (PFG) model for person name disambiguation. The model is flexible and can be easily extended by incorporating various features. Based on the PFG model, we propose an active name disambiguation algorithm, aiming to improve the disambiguation performance by maximizing the utility of the user´s correction. Experimental results on three different genres of data sets show that with only a few user corrections, the error rate of name disambiguation can be reduced to 3.1%. A real system has been developed based on the proposed method and is available online.
  • Keywords
    Internet; graph theory; information retrieval; user interfaces; ADANA method; PFG model; active name disambiguation algorithm; active person name disambiguation; active user interactions; name ambiguity; pairwise factor graph model; user correction; utility maximizing; Accuracy; Approximation algorithms; Clustering algorithms; Correlation; Educational institutions; Inference algorithms; Web pages; Active Learning; Digital Library; Name Disambiguation; Social Network Analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.19
  • Filename
    6137284