DocumentCode
1791558
Title
In-memory I/O and replication for HDFS with Memcached: Early experiences
Author
Islam, Nusrat Sharmin ; Xiaoyi Lu ; Wasi-ur-Rahman, Md ; Rajachandrasekar, Raghunath ; Panda, Dhabaleswar K. D. K.
Author_Institution
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear
2014
fDate
27-30 Oct. 2014
Firstpage
213
Lastpage
218
Abstract
Hadoop is the de-facto standard platform for large-scale data analytic applications. In spite of high availability and reliability guarantees, Hadoop Distributed File System (HDFS) suffers from huge I/O bottlenecks for storing the tri-replicated data blocks. The I/O overheads intrinsic to the HDFS architecture degrade the application performance. In this paper, we present a novel design (MEM-HDFS) to perform intelligent caching and replication of HDFS data blocks in Memcached that can significantly improve the I/O performance. In this design, we consider different deployment strategies for the Memcached servers (local and remote) and guarantee persistence of the Memcached data to HDFS on cache replacements. Performance evaluations show that MEM-HDFS can increase the read and write throughput of HDFS by up to 3.9x and 3.3x, respectively. Our design can also significantly speed up the data loading (to HDFS) phase. It reduces the execution times of data generation benchmarks like, TeraGen, RandomTextWriter, and RandomWriter by up to 50%, 39%, and 48%, respectively. The performances of other benchmarks like TeraSort and Grep are also improved by the proposed design.
Keywords
data analysis; distributed databases; HDFS architecture; Hadoop distributed file system; MEM-HDFS; Memcached data; Memcached servers; RandomTextWriter; TeraGen; data generation benchmarks; large-scale data analytic applications; reliability; Bandwidth; Benchmark testing; Computer architecture; Java; Loading; Servers; Throughput;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location
Washington, DC
Type
conf
DOI
10.1109/BigData.2014.7004235
Filename
7004235
Link To Document