DocumentCode
146429
Title
Improving HDFS write performance using efficient replica placement
Author
Patel Neha, M. ; Patel Narendra, M. ; Hasan, Md Imran ; Shah Parth, D. ; Patel Mayur, M.
Author_Institution
CSPIT-CHARUSAT, Anand, India
fYear
2014
fDate
25-26 Sept. 2014
Firstpage
36
Lastpage
39
Abstract
In last half decade, there is a tremendous growth in the network applications; we are experiencing an information explosion era and for that large amount of distributed data being stored and managed. Distributed file system is designed to handle these types of data. Major design issues in DFS are scalability, fault tolerance, flexibility and availability. The most prevalent DFS to deal with these challenges is the Hadoop Distributed File System (HDFS) which is a variant of the Google File System (GFS). Apache Hadoop able to solve current issues of Big Data by simplifying the implementation of data intensive and exceptionally parallel distributed applications. HDFS handles fault tolerance using Data Replication. HDFS replicates each data block on different datanode for reliability and availability. The existing implementation of HDFS in Hadoop performs replication in a pipelined manner which takes much time for replication. Here proposed system is an alternative parallel approach for efficient replica placement in HDFS to improve throughput. The experimentation has been performed to compare its performance with existing pipelined replication approach, which improve HDFS write throughput up to 10% testified by the TestDFSIO benchmark. This paper also depicts the analysis on the basis of different HDFS configuration parameter like file block size and replication factor which affects HDFS write performance in both approaches.
Keywords
Big Data; distributed databases; fault tolerant computing; parallel processing; pipeline processing; Apache Hadoop; Big Data; GFS; Google file system; HDFS write performance; Hadoop distributed file system; TestDFSIO benchmark; data handling; data replication; distributed data; fault tolerance; file block size; information explosion era; network applications; parallel approach; parallel distributed applications; pipelined manner; replica placement; replication factor; Distributed databases; Fault tolerance; Fault tolerant systems; File systems; Pipelines; Throughput; Writing; Hadoop Distributed File System (HDFS); Parallel; Pipelined; Replication factor (R.F);
fLanguage
English
Publisher
ieee
Conference_Titel
Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference -
Conference_Location
Noida
Print_ISBN
978-1-4799-4237-4
Type
conf
DOI
10.1109/CONFLUENCE.2014.6949234
Filename
6949234
Link To Document