Title of article :
Cross-Checking Multiple Data Sources Using Multiway Join in MapReduce
Author/Authors :
Afrati, Foto National Technical University of Athens, Athens, Greece , Momani, Zaid National Technical University of Athens, Athens, Greece , Stasinopoulos, Nikos National Technical University of Athens, Athens, Greece
Pages :
12
From page :
1
To page :
12
Abstract :
As data sources accumulate information and data size escalates it becomes more and more difficult to maintain the correctness and validity of these datasets. Therefore, tools must emerge to facilitate this daunting task. Fact checking usually involves a large number of data sources that talk about the same thing but we are not sure which holds the correct information or which has any information at all about the query we care for. A join among all or some data sources can guide us through a fact-checking process. However, when we want to perform this join on a distributed computational environment such as MapReduce, it is not obvious how to distribute efficiently the records in the data sources to the reduce tasks in order to join any subset of them in a single MapReduce job. To this end, we propose an efficient approach using the multiway join to cross-check these data sources in a single round.
Keywords :
MapReduce , Cross-Checking , Data Sources
Journal title :
Scientific Programming
Serial Year :
2017
Full Text URL :
Record number :
2607700
Link To Document :
بازگشت