DocumentCode :
3717143
Title :
Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters
Author :
Nusrat Sharmin Islam;Md. Wasi-ur-Rahman;Xiaoyi Lu;Dipti Shankar;Dhabaleswar K. Panda
Author_Institution :
Department of Computer Science and Engineering, The Ohio State University
fYear :
2015
Firstpage :
243
Lastpage :
252
Abstract :
For data-intensive computing, the low throughput of the existing disk-bound storage systems is a major bottleneck. Recent emergence of the in-memory file systems with heterogeneous storage support mitigates this problem to a great extent. Parallel programming frameworks, e.g. Hadoop MapReduce and Spark are increasingly being run on such high-performance file systems. However, no comprehensive study has been done to analyze the impacts of the in-memory file systems on various Big Data applications. This paper characterizes two file systems in literature, Tachyon [17] and Triple-H [13] that support in-memory and heterogeneous storage, and discusses the impacts of these two architectures on the performance and fault tolerance of Hadoop MapReduce and Spark applications. We present a complete methodology for evaluating MapReduce and Spark workloads on top of in-memory file systems and provide insights about the interactions of different system components while running these workloads. We also propose advanced acceleration techniques to adapt Triple-H for iterative applications and study the impact of different parameters on the performance of MapReduce and Spark jobs on HPC systems. Our evaluations show that, although Tachyon is 5x faster than HDFS for primitive operations, Triple-H performs 47% and 2.4x better than Tachyon for MapReduce and Spark workloads, respectively. Triple-H also accelerates K-Means by 15% over HDFS and 9% over Tachyon.
Keywords :
"Sparks","Fault tolerance","Fault tolerant systems","Big data","Acceleration","Yarn","Systems architecture"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363761
Filename :
7363761
Link To Document :
بازگشت