Title :
Zput: A speedy data uploading approach for the Hadoop Distributed File System
Author :
Youwei Wang ; Weiping Wang ; Can Ma ; Dan Meng
Author_Institution :
Integration Applic. Center, Inst. Of Comput. Technol., Beijing, China
Abstract :
Hadoop Distributed File System (HDFS) is the storage component of the Hadoop framework, which is designed for maintaining and processing huge datasets efficiently among cluster nodes. To cooperate with MapReduce, the computation infrastructure of Hadoop, data is required to be uploaded from local file systems to HDFS. Unfortunately when data is of massive scale, the uploading procedure becomes extremely time-consuming, which causes serious delay for urgent tasks. This primary contribution of this paper is the proposition of Zput, a speedy data uploading mechanism which can significantly accelerate uploading by using metadata mapping approach. After the implementation is described and corresponding advantages are narrated, disadvantages are also analyzed and eliminated by using an approach named remote block placement. Evaluation results show this new mechanism can reduce the running time of uploading process by about 60-90%, and the remote block placement can boost the course of block distribution by about 30-40%, while maintaining the complete compatibility for upper-layer applications.
Keywords :
distributed databases; meta data; storage management; HDFS; Hadoop distributed file system; Zput; block distribution; computation infrastructure; dataset maintenance; dataset processing; metadata mapping approach; remote block placement; speedy data uploading approach; storage component; upper-layer applications; Cryptography; IP networks; Reliability; Switches; Block Replication and Placement; Distributed File System; Metadata Manipulation;
Conference_Titel :
Cluster Computing (CLUSTER), 2013 IEEE International Conference on
Conference_Location :
Indianapolis, IN
DOI :
10.1109/CLUSTER.2013.6702648