Title :
Poster: Efficient record linkage techniques
Author :
Mamun, Abdullah-Al ; Aseltine, Robert ; Rajasekaran, Sanguthevar
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Connecticut, Storrs, CT, USA
Abstract :
Record linkage or deduplication integrates records across multiple data sources. We propose sequential and parallel techniques for record linkage using complete linkage clustering. The key idea of these approaches is radix sorting and blocking on data attributes and producing a graph-based solution. These methods have been tested on real datasets as well as synthetic datasets. They identify records belong to individuals with almost 100% accuracy.
Keywords :
graph theory; medical administrative data processing; relational databases; sorting; complete linkage clustering; data attributes; deduplication; efficient record linkage techniques; graph based solution; multiple data sources; parallel techniques; radix blocking; radix sorting; sequential techniques; Accuracy; Clustering algorithms; Computer science; Couplings; Educational institutions; Electronic mail; Public healthcare;
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4799-5786-6
DOI :
10.1109/ICCABS.2014.6863930