مرکز منطقه ای اطلاع رساني علوم و فناوري - Parallel Top-K Similarity Join Algorithms Using MapReduce

DocumentCode :

2456991

Title :

Parallel Top-K Similarity Join Algorithms Using MapReduce

Author :

Kim, Younghoon ; Shim, Kyuseok

Author_Institution :

Dept. of EECS, Seoul Nat. Univ., Seoul, South Korea

fYear :

2012

fDate :

1-5 April 2012

Firstpage :

510

Lastpage :

521

Abstract :

There is a wide range of applications that require finding the top-k most similar pairs of records in a given database. However, computing such top-k similarity joins is a challenging problem today, as there is an increasing trend of applications that expect to deal with vast amounts of data. For such data-intensive applications, parallel executions of programs on a large cluster of commodity machines using the MapReduce paradigm have recently received a lot of attention. In this paper, we investigate how the top-k similarity join algorithms can get benefits from the popular MapReduce framework. We first develop the divide-and-conquer and branch-and-bound algorithms. We next propose the all pair partitioning and essential pair partitioning methods to minimize the amount of data transfers between map and reduce functions. We finally perform the experiments with not only synthetic but also real-life data sets. Our performance study confirms the effectiveness and scalability of our MapReduce algorithms.

Keywords :

database management systems; divide and conquer methods; parallel databases; tree searching; MapReduce; MapReduce framework; branch-and-bound algorithms; commodity machines; data transfers; data-intensive applications; database; divide-and-conquer algorithms; pair partitioning methods; parallel program executions; parallel top-k similarity join algorithms; top-k most similar pairs; Approximation algorithms; Arrays; Clustering algorithms; Complexity theory; Euclidean distance; Indexes; Partitioning algorithms;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Engineering (ICDE), 2012 IEEE 28th International Conference on

Conference_Location :

Washington, DC

ISSN :

1063-6382

Print_ISBN :

978-1-4673-0042-1

Type :

conf

DOI :

10.1109/ICDE.2012.87

Filename :

6228110

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2456991