DocumentCode :
146429
Title :
Improving HDFS write performance using efficient replica placement
Author :
Patel Neha, M. ; Patel Narendra, M. ; Hasan, Md Imran ; Shah Parth, D. ; Patel Mayur, M.
Author_Institution :
CSPIT-CHARUSAT, Anand, India
fYear :
2014
fDate :
25-26 Sept. 2014
Firstpage :
36
Lastpage :
39
Abstract :
In last half decade, there is a tremendous growth in the network applications; we are experiencing an information explosion era and for that large amount of distributed data being stored and managed. Distributed file system is designed to handle these types of data. Major design issues in DFS are scalability, fault tolerance, flexibility and availability. The most prevalent DFS to deal with these challenges is the Hadoop Distributed File System (HDFS) which is a variant of the Google File System (GFS). Apache Hadoop able to solve current issues of Big Data by simplifying the implementation of data intensive and exceptionally parallel distributed applications. HDFS handles fault tolerance using Data Replication. HDFS replicates each data block on different datanode for reliability and availability. The existing implementation of HDFS in Hadoop performs replication in a pipelined manner which takes much time for replication. Here proposed system is an alternative parallel approach for efficient replica placement in HDFS to improve throughput. The experimentation has been performed to compare its performance with existing pipelined replication approach, which improve HDFS write throughput up to 10% testified by the TestDFSIO benchmark. This paper also depicts the analysis on the basis of different HDFS configuration parameter like file block size and replication factor which affects HDFS write performance in both approaches.
Keywords :
Big Data; distributed databases; fault tolerant computing; parallel processing; pipeline processing; Apache Hadoop; Big Data; GFS; Google file system; HDFS write performance; Hadoop distributed file system; TestDFSIO benchmark; data handling; data replication; distributed data; fault tolerance; file block size; information explosion era; network applications; parallel approach; parallel distributed applications; pipelined manner; replica placement; replication factor; Distributed databases; Fault tolerance; Fault tolerant systems; File systems; Pipelines; Throughput; Writing; Hadoop Distributed File System (HDFS); Parallel; Pipelined; Replication factor (R.F);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference -
Conference_Location :
Noida
Print_ISBN :
978-1-4799-4237-4
Type :
conf
DOI :
10.1109/CONFLUENCE.2014.6949234
Filename :
6949234
Link To Document :
بازگشت