DocumentCode :
2936862
Title :
D-Dupe: An Interactive Tool for Entity Resolution in Social Networks
Author :
Bilgic, Mustafa ; Licamele, Louis ; Getoor, Lise ; Shneiderman, Ben
Author_Institution :
Maryland Univ., College Park, MD
fYear :
2006
fDate :
Oct. 31 2006-Nov. 2 2006
Firstpage :
43
Lastpage :
50
Abstract :
Visualizing and analyzing social networks is a challenging problem that has been receiving growing attention. An important first step, before analysis can begin, is ensuring that the data is accurate. A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity; the process of reconciling these references is called entity-resolution. D-Dupe is an interactive tool that combines data mining algorithms for entity resolution with a task-specific network visualization. Users cope with complexity of cleaning large networks by focusing on a small subnetwork containing a potential duplicate pair. The subnetwork highlights relationships in the social network, making the common relationships easy to visually identify. D-Dupe users resolve ambiguities either by merging nodes or by marking them distinct. The entity resolution process is iterative: as pairs of nodes are resolved, additional duplicates may be revealed; therefore, resolution decisions are often chained together. We give examples of how users can flexibly apply sequences of actions to produce a high quality entity resolution result. We illustrate and evaluate the benefits of D-Dupe on three bibliographic collections. Two of the datasets had already been cleaned, and therefore should not have contained duplicates; despite this fact, many duplicates were rapidly identified using D-Dupe´s unique combination of entity resolution algorithms within a task-specific visual interface
Keywords :
data mining; interactive systems; social sciences computing; user interfaces; D-Dupe interactive tool; data mining algorithm; data quality problem; entity resolution; entity-resolution; social network analysis; social network visualization; task-specific network visualization; task-specific visual interface; Data cleaning and integration; H.2.8 [Information Systems]: Database Applications¿Data mining; H.5.2 [Information Interfaces and Presentation]: User Interfaces¿User-centered design; user interfaces; visual analytics; visual data mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Visual Analytics Science And Technology, 2006 IEEE Symposium On
Conference_Location :
Baltimore, MD
Print_ISBN :
1-4244-0591-2
Electronic_ISBN :
1-4244-0592-0
Type :
conf
DOI :
10.1109/VAST.2006.261429
Filename :
4035746
Link To Document :
بازگشت