DocumentCode
1899702
Title
A novel approach to improve the performance of Hadoop in handling of small files
Author
Gohil, Parth ; Panchal, Bakul ; Dhobi, J.S.
Author_Institution
Dept. of Comput. Sci. & Eng., Inst. of Technol., Varnama, India
fYear
2015
fDate
5-7 March 2015
Firstpage
1
Lastpage
5
Abstract
Hadoop, an open source java framework deals with big data. It has mainly two core components: HDFS (Hadoop distributed file system) which stores large amount of data in a reliable manner and another is MapReduce which is a programming model which processes the data in parallel and distributed manner. Hadoop does not perform well for small files as a large number of small files pose a heavy burden on the NameNode of HDFS and an increase in execution time for MapReduce is encountered. Hadoop is designed to handle huge size files and hence suffers a performance penalty while dealing with large number of small files. This research work gives an introduction about HDFS, small file problem and existing ways to deal with it these problems along with proposed approach to handle small files. In proposed approach, merging of small file is done using MapReduce programming model on Hadoop. This approach improves the performance of Hadoop in handling of small files by ignoring the files whose size is larger than the block size of Hadoop and also reduces the memory required by NameNode to store them.
Keywords
data handling; distributed databases; parallel processing; HDFS; Hadoop distributed file system; MapReduce programming; NameNode; programming model; small files handling; Blogs; File systems; Memory management; Tutorials; Amazon EC2; HDFS; Hadoop; MapReduce; Small Files;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical, Computer and Communication Technologies (ICECCT), 2015 IEEE International Conference on
Conference_Location
Coimbatore
Print_ISBN
978-1-4799-6084-2
Type
conf
DOI
10.1109/ICECCT.2015.7226044
Filename
7226044
Link To Document