DocumentCode :
1925358
Title :
Improving metadata management for small files in HDFS
Author :
Mackey, Grant ; Sehrish, Saba ; Wang, Jun
Author_Institution :
Sch. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
fYear :
2009
fDate :
Aug. 31 2009-Sept. 4 2009
Firstpage :
1
Lastpage :
4
Abstract :
Scientific applications are adapting HDFS/MapReduce to perform large scale data analytics. One of the major challenges is that an overabundance of small files is common in these applications, and HDFS manages all its files through a single server, the Namenode. It is anticipated that small files can significantly impact the performance of Namenode. In this work we propose a mechanism to store small files in HDFS efficiently and improve the space utilization for metadata. Our scheme is based on the assumption that each client is assigned a quota in the file system, for both the space and number of files. In our approach, we utilize the compression method `harballing´, provided by Hadoop, to better utilize the HDFS. We provide for new job functionality to allow for in-job archival of directories and files so that running MapReduce programs may complete without being killed by the jobtracker due to quota policies. This approach leads to better functionality of metadata operations and more efficient usage of the HDFS. Our analysis results show that we can reduce the metadata footprint in main memory by a factor of 42.
Keywords :
cache storage; data analysis; meta data; HDFS-MapReduce program; Namenode; file storage; hadoop distributed file system; large datasets analysis; metadata management; single server; Application software; Availability; Computer architecture; Computer science; Data analysis; Engineering management; File servers; File systems; Large-scale systems; Performance analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Conference_Location :
New Orleans, LA
ISSN :
1552-5244
Print_ISBN :
978-1-4244-5011-4
Electronic_ISBN :
1552-5244
Type :
conf
DOI :
10.1109/CLUSTR.2009.5289133
Filename :
5289133
Link To Document :
بازگشت