مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving HDFS write performance using efficient replica placement

DocumentCode :

146429

Title :

Improving HDFS write performance using efficient replica placement

Author :

Patel Neha, M. ; Patel Narendra, M. ; Hasan, Md Imran ; Shah Parth, D. ; Patel Mayur, M.

Author_Institution :

CSPIT-CHARUSAT, Anand, India

fYear :

2014

fDate :

25-26 Sept. 2014

Firstpage :

Lastpage :

Abstract :

In last half decade, there is a tremendous growth in the network applications; we are experiencing an information explosion era and for that large amount of distributed data being stored and managed. Distributed file system is designed to handle these types of data. Major design issues in DFS are scalability, fault tolerance, flexibility and availability. The most prevalent DFS to deal with these challenges is the Hadoop Distributed File System (HDFS) which is a variant of the Google File System (GFS). Apache Hadoop able to solve current issues of Big Data by simplifying the implementation of data intensive and exceptionally parallel distributed applications. HDFS handles fault tolerance using Data Replication. HDFS replicates each data block on different datanode for reliability and availability. The existing implementation of HDFS in Hadoop performs replication in a pipelined manner which takes much time for replication. Here proposed system is an alternative parallel approach for efficient replica placement in HDFS to improve throughput. The experimentation has been performed to compare its performance with existing pipelined replication approach, which improve HDFS write throughput up to 10% testified by the TestDFSIO benchmark. This paper also depicts the analysis on the basis of different HDFS configuration parameter like file block size and replication factor which affects HDFS write performance in both approaches.

Keywords :

Big Data; distributed databases; fault tolerant computing; parallel processing; pipeline processing; Apache Hadoop; Big Data; GFS; Google file system; HDFS write performance; Hadoop distributed file system; TestDFSIO benchmark; data handling; data replication; distributed data; fault tolerance; file block size; information explosion era; network applications; parallel approach; parallel distributed applications; pipelined manner; replica placement; replication factor; Distributed databases; Fault tolerance; Fault tolerant systems; File systems; Pipelines; Throughput; Writing; Hadoop Distributed File System (HDFS); Parallel; Pipelined; Replication factor (R.F);

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference -

Conference_Location :

Noida

Print_ISBN :

978-1-4799-4237-4

Type :

conf

DOI :

10.1109/CONFLUENCE.2014.6949234

Filename :

6949234

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=146429