• DocumentCode
    2936862
  • Title

    D-Dupe: An Interactive Tool for Entity Resolution in Social Networks

  • Author

    Bilgic, Mustafa ; Licamele, Louis ; Getoor, Lise ; Shneiderman, Ben

  • Author_Institution
    Maryland Univ., College Park, MD
  • fYear
    2006
  • fDate
    Oct. 31 2006-Nov. 2 2006
  • Firstpage
    43
  • Lastpage
    50
  • Abstract
    Visualizing and analyzing social networks is a challenging problem that has been receiving growing attention. An important first step, before analysis can begin, is ensuring that the data is accurate. A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity; the process of reconciling these references is called entity-resolution. D-Dupe is an interactive tool that combines data mining algorithms for entity resolution with a task-specific network visualization. Users cope with complexity of cleaning large networks by focusing on a small subnetwork containing a potential duplicate pair. The subnetwork highlights relationships in the social network, making the common relationships easy to visually identify. D-Dupe users resolve ambiguities either by merging nodes or by marking them distinct. The entity resolution process is iterative: as pairs of nodes are resolved, additional duplicates may be revealed; therefore, resolution decisions are often chained together. We give examples of how users can flexibly apply sequences of actions to produce a high quality entity resolution result. We illustrate and evaluate the benefits of D-Dupe on three bibliographic collections. Two of the datasets had already been cleaned, and therefore should not have contained duplicates; despite this fact, many duplicates were rapidly identified using D-Dupe´s unique combination of entity resolution algorithms within a task-specific visual interface
  • Keywords
    data mining; interactive systems; social sciences computing; user interfaces; D-Dupe interactive tool; data mining algorithm; data quality problem; entity resolution; entity-resolution; social network analysis; social network visualization; task-specific network visualization; task-specific visual interface; Data cleaning and integration; H.2.8 [Information Systems]: Database Applications¿Data mining; H.5.2 [Information Interfaces and Presentation]: User Interfaces¿User-centered design; user interfaces; visual analytics; visual data mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Visual Analytics Science And Technology, 2006 IEEE Symposium On
  • Conference_Location
    Baltimore, MD
  • Print_ISBN
    1-4244-0591-2
  • Electronic_ISBN
    1-4244-0592-0
  • Type

    conf

  • DOI
    10.1109/VAST.2006.261429
  • Filename
    4035746