• DocumentCode
    3681174
  • Title

    Parallel NoSQL Entity Resolution Approach with MapReduce

  • Author

    Kun Ma;Bo Yang

  • Author_Institution
    Shandong Provincial Key Lab. of Network Based Intell. Comput., Univ. of Jinan, Jinan, China
  • fYear
    2015
  • Firstpage
    384
  • Lastpage
    389
  • Abstract
    To address the limitation of entity resolution of NoSQL documents, we propose a new parallel NoSQL entity resolution approach with MapReduce. Although current MapReduce framework enables efficient parallel execution of entity resolution, it cannot find duplicates in adjacent block easily. Therefore, we investigate possible solutions called Partition-Sort-Map-Reduce to find duplicates by overlapping boundary objects in adjacent blocks. Finally, our experimental evaluation based on NoSQL breeding data and the analysis of time complexity show the high effectiveness and efficiency of the proposed entity resolution approaches.
  • Keywords
    "Sorting","Time complexity","Batch production systems","Parallel processing","Artificial intelligence","Tin"
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Networking and Collaborative Systems (INCOS), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/INCoS.2015.16
  • Filename
    7312102