Title :
Improving the Efficiency of Storing for Small Files in HDFS
Author :
Zhang, Yang ; Liu, Dan
Author_Institution :
Res. Inst. of Electron. Sci. & Technol., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
Abstract :
HDFS (Hadoop Distributed File System) is the popular file system. But HDFS has inefficient issue with small files. Traditional method has the drawback of high resource consumption and low efficiency performance. In order to resolve this problem, this paper proposes a novel approach for small files process, which works as an engine independent with the HDFS. This engine can reduce the overhead of HDFS effectively. It uses Reactor multiplexed IO to build the server and uses non-blocking IO to merge and read small files. And the engine has a cache of small files that can make the reading efficiently. This paper presents the small files processing strategy for files efficient merger, which builds the file index and uses boundary file block filling mechanism to accomplish files separation and files retrieval. At last the experimental results show that the novel approach has improved the efficiency of storing and processing massive small files in HDFS.
Keywords :
file organisation; indexing; information retrieval; input-output programs; records management; HDFS; Hadoop Distributed File System; boundary file block filling mechanism; file index; files efficient merger; files retrieval; files separation; massive small file processing; massive small file storing; nonblocking IO; reactor multiplexed IO; small files process; Conferences; Corporate acquisitions; Engines; File systems; Indexes; Memory management; Servers; Hadoop Distributed FileSystem; file merger mechanism; small file storage;
Conference_Titel :
Computer Science & Service System (CSSS), 2012 International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4673-0721-5
DOI :
10.1109/CSSS.2012.556