DocumentCode :
1926832
Title :
Comparing map-reduce and FREERIDE for data-intensive applications
Author :
Jiang, Wei ; Ravi, Vignesh T. ; Agrawal, Gagan
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2009
fDate :
Aug. 31 2009-Sept. 4 2009
Firstpage :
1
Lastpage :
10
Abstract :
Map-reduce has been a topic of much interest in the last 2-3 years. While it is well accepted that the map-reduce APIs enable significantly easier programming, the performance aspects of the use of map-reduce are less well understood. This paper focuses on comparing the map-reduce paradigm with a system that was developed earlier at Ohio State, FREERIDE (FRamework for Rapid Implementation of Datamining Engines). The API and the functionality offered by FREERIDE has many similarities with the map-reduce API. However, there are some differences in the API. Moreover, while FREERIDE was motivated by data mining computations, map-reduce was motivated by searching, sorting, and related applications in a data-center. We compare the programming APIs and performance of the Hadoop implementation of map-reduce with FREERIDE. For our study, we have taken three data mining algorithms, which are k-means clustering, apriori association mining, and k-nearest neighbor search. We have also included a simple data scanning application, word-count. The main observations from our results are as follows. For the three data mining applications we have considered, FREERIDE outperformed Hadoop by a factor of 5 or more. For word-count, Hadoop is better by a factor of up to 2. With increasing dataset sizes, the relative performance of Hadoop becomes better. Overall, it seems that Hadoop has significant overheads related to initialization, I/O, and sorting of (key, value) pairs. Thus, despite an easy to program API, Hadoop´s map-reduce does not appear very suitable for data mining computations on modest-sized datasets.
Keywords :
application program interfaces; data mining; API; FREERIDE; data mining engines; data-intensive applications; map-reduce; rapid implementation; Application software; Classification tree analysis; Clustering algorithms; Computer science; Data analysis; Data mining; Image analysis; Large-scale systems; Productivity; Sorting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Conference_Location :
New Orleans, LA
ISSN :
1552-5244
Print_ISBN :
978-1-4244-5011-4
Electronic_ISBN :
1552-5244
Type :
conf
DOI :
10.1109/CLUSTR.2009.5289199
Filename :
5289199
Link To Document :
بازگشت