مرکز منطقه ای اطلاع رساني علوم و فناوري - Comparing map-reduce and FREERIDE for data-intensive applications

DocumentCode :

1926832

Title :

Comparing map-reduce and FREERIDE for data-intensive applications

Author :

Jiang, Wei ; Ravi, Vignesh T. ; Agrawal, Gagan

Author_Institution :

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear :

2009

fDate :

Aug. 31 2009-Sept. 4 2009

Firstpage :

Lastpage :

Abstract :

Map-reduce has been a topic of much interest in the last 2-3 years. While it is well accepted that the map-reduce APIs enable significantly easier programming, the performance aspects of the use of map-reduce are less well understood. This paper focuses on comparing the map-reduce paradigm with a system that was developed earlier at Ohio State, FREERIDE (FRamework for Rapid Implementation of Datamining Engines). The API and the functionality offered by FREERIDE has many similarities with the map-reduce API. However, there are some differences in the API. Moreover, while FREERIDE was motivated by data mining computations, map-reduce was motivated by searching, sorting, and related applications in a data-center. We compare the programming APIs and performance of the Hadoop implementation of map-reduce with FREERIDE. For our study, we have taken three data mining algorithms, which are k-means clustering, apriori association mining, and k-nearest neighbor search. We have also included a simple data scanning application, word-count. The main observations from our results are as follows. For the three data mining applications we have considered, FREERIDE outperformed Hadoop by a factor of 5 or more. For word-count, Hadoop is better by a factor of up to 2. With increasing dataset sizes, the relative performance of Hadoop becomes better. Overall, it seems that Hadoop has significant overheads related to initialization, I/O, and sorting of (key, value) pairs. Thus, despite an easy to program API, Hadoop´s map-reduce does not appear very suitable for data mining computations on modest-sized datasets.

Keywords :

application program interfaces; data mining; API; FREERIDE; data mining engines; data-intensive applications; map-reduce; rapid implementation; Application software; Classification tree analysis; Clustering algorithms; Computer science; Data analysis; Data mining; Image analysis; Large-scale systems; Productivity; Sorting;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on

Conference_Location :

New Orleans, LA

ISSN :

1552-5244

Print_ISBN :

978-1-4244-5011-4

Electronic_ISBN :

1552-5244

Type :

conf

DOI :

10.1109/CLUSTR.2009.5289199

Filename :

5289199

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1926832