DocumentCode :
191040
Title :
Poster: Efficient record linkage techniques
Author :
Mamun, Abdullah-Al ; Aseltine, Robert ; Rajasekaran, Sanguthevar
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Connecticut, Storrs, CT, USA
fYear :
2014
fDate :
2-4 June 2014
Firstpage :
1
Lastpage :
1
Abstract :
Record linkage or deduplication integrates records across multiple data sources. We propose sequential and parallel techniques for record linkage using complete linkage clustering. The key idea of these approaches is radix sorting and blocking on data attributes and producing a graph-based solution. These methods have been tested on real datasets as well as synthetic datasets. They identify records belong to individuals with almost 100% accuracy.
Keywords :
graph theory; medical administrative data processing; relational databases; sorting; complete linkage clustering; data attributes; deduplication; efficient record linkage techniques; graph based solution; multiple data sources; parallel techniques; radix blocking; radix sorting; sequential techniques; Accuracy; Clustering algorithms; Computer science; Couplings; Educational institutions; Electronic mail; Public healthcare;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4799-5786-6
Type :
conf
DOI :
10.1109/ICCABS.2014.6863930
Filename :
6863930
Link To Document :
بازگشت