DocumentCode :
1926244
Title :
HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment
Author :
Seo, Sangwon ; Jang, Ingook ; Woo, Kyungchang ; Kim, Inkyo ; Kim, Jin-Soo ; Maeng, Seungryoul
Author_Institution :
Comput. Sci. Dept., Korea Adv. Inst. of Sci. & Technol. (KAIST), Daejeon, South Korea
fYear :
2009
fDate :
Aug. 31 2009-Sept. 4 2009
Firstpage :
1
Lastpage :
8
Abstract :
MapReduce is a programming model that supports distributed and parallel processing for large-scale data-intensive applications such as machine learning, data mining, and scientific simulation. Hadoop is an open-source implementation of the MapReduce programming model. Hadoop is used by many companies including Yahoo!, Amazon, and Facebook to perform various data mining on large-scale data sets such as user search logs and visit logs. In these cases, it is very common to share the same computing resources by multiple users due to practical considerations about cost, system utilization, and manageability. However, Hadoop assumes that all cluster nodes are dedicated to a single user, failing to guarantee high performance in the shared MapReduce computation environment. In this paper, we propose two optimization schemes, prefetching and pre-shuffling, which improve the overall performance under the shared environment while retaining compatibility with the native Hadoop. The proposed schemes are implemented in the native Hadoop-0.18.3 as a plug-in component called HPMR (high performance MapReduce engine). Our evaluation on the Yahoo!Grid platform with three different workloads and seven types of test sets from Yahoo! shows that HPMR reduces the execution time by up to 73%.
Keywords :
data mining; distributed processing; optimisation; storage management; Hadoop; MapReduce programming model; data mining; distributed processing; high performance MapReduce engine; optimization scheme; parallel processing; prefetching scheme; preshuffling scheme; Computational modeling; Costs; Data mining; Facebook; Large-scale systems; Machine learning; Open source software; Parallel processing; Parallel programming; Prefetching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Conference_Location :
New Orleans, LA
ISSN :
1552-5244
Print_ISBN :
978-1-4244-5011-4
Electronic_ISBN :
1552-5244
Type :
conf
DOI :
10.1109/CLUSTR.2009.5289171
Filename :
5289171
Link To Document :
بازگشت