DocumentCode
3681174
Title
Parallel NoSQL Entity Resolution Approach with MapReduce
Author
Kun Ma;Bo Yang
Author_Institution
Shandong Provincial Key Lab. of Network Based Intell. Comput., Univ. of Jinan, Jinan, China
fYear
2015
Firstpage
384
Lastpage
389
Abstract
To address the limitation of entity resolution of NoSQL documents, we propose a new parallel NoSQL entity resolution approach with MapReduce. Although current MapReduce framework enables efficient parallel execution of entity resolution, it cannot find duplicates in adjacent block easily. Therefore, we investigate possible solutions called Partition-Sort-Map-Reduce to find duplicates by overlapping boundary objects in adjacent blocks. Finally, our experimental evaluation based on NoSQL breeding data and the analysis of time complexity show the high effectiveness and efficiency of the proposed entity resolution approaches.
Keywords
"Sorting","Time complexity","Batch production systems","Parallel processing","Artificial intelligence","Tin"
Publisher
ieee
Conference_Titel
Intelligent Networking and Collaborative Systems (INCOS), 2015 International Conference on
Type
conf
DOI
10.1109/INCoS.2015.16
Filename
7312102
Link To Document