DocumentCode :
1783195
Title :
MapReuse: Reusing Computation in an In-Memory MapReduce System
Author :
Tiwari, D. ; Solihin, Y.
Author_Institution :
Oak Ridge Nat. Lab., Oak Ridge, TN, USA
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
61
Lastpage :
71
Abstract :
MapReduce programming model is being increasingly adopted for data intensive high performance computing. Recently, it has been observed that in data-intensive environment, programs are often run multiple times with either identical or slightly-changed input, which creates a significant opportunity for computation reuse. Recognizing the opportunity, researchers have proposed techniques to reuse computation in disk-based MapReduce systems such as Hadoop, but not for in-memory MapReduce (IMMR) systems such as Phoenix. In this paper, we propose a novel technique for computation reuse in IMMR systems, which we refer to as MapReuse. MapReuse detects input similarity by comparing their signatures. It skips re-computing output from a repeated portion of the input, computes output from a new portion of input, and removes output that corresponds to a deleted portion of the input. MapReuse is built on top of an existing IMMR system, leaving it largely unmodified. MapReuse significantly speeds up IMMR, even when the new input differs by 25% compared to the original input.
Keywords :
parallel programming; Hadoop; IMMR systems; MapReuse; Phoenix; computation reuse; data intensive high performance computing; disk-based MapReduce systems; in-memory MapReduce system; parallel programming model; Complexity theory; Data structures; Engines; Indexes; Instruction sets; Runtime; Servers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
ISSN :
1530-2075
Print_ISBN :
978-1-4799-3799-8
Type :
conf
DOI :
10.1109/IPDPS.2014.18
Filename :
6877242
Link To Document :
بازگشت