DocumentCode
1737111
Title
HAR+: Archive and metadata distribution! Why not both?
Author
Dev, Dipayan ; Patgiri, Ripon
Author_Institution
Dept. of Comput. Sci. & Eng., Nat. Inst. of Technol., Silchar, India
fYear
2015
Firstpage
1
Lastpage
6
Abstract
Size of the data used in today´s enterprises has been expanding at a huge range from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. Hadoop Distributed File System (HDFS), is an open source implementation of Apache, designed for running on commodity hardware to handle applications having large datasets (TB, PB). HDFS architecture is based on single master (Name Node), which handles the metadata for large number of slaves. To get maximum efficiency, Name Node stores all of the metadata in its RAM. So, when dealing with huge number of small files, Name Node often becomes a bottleneck for HDFS as it might run out of memory. Apache Hadoop uses Hadoop ARchive (HAR) to deal with small files. But it is not so efficient for multi-NameNode environment, which requires automatic scaling of metadata. In this paper, we have designed hashtable based architecture, Hadoop ARchive Plus (HAR+) using sha256 as the key, which is a modification of existing HAR. HAR+ is designed to provide more reliability which can also provide auto scaling of metadata. Instead of using one NameNode for storing the metadata, HAR+ uses multiple NameNodes. Our result shows that HAR+ reduces the load of a single NameNode in significant amount. This makes the cluster more scalable, more robust and less prone to failure unlike of Hadoop ARchive.
Keywords
data handling; meta data; network operating systems; parallel architectures; public domain software; random-access storage; reliability; Apache Hadoop ARchive Plus; HAR+; HDFS architecture; Hadoop distributed file system; RAM; metadata distribution; multiple NameNode environment; open source; reliability; sha256; Computer architecture; Computers; File systems; Indexes; Random access memory; Reliability; Big Data; HAR; HDFS; Hadoop; Metadata; Small files;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Communication and Informatics (ICCCI), 2015 International Conference on
Conference_Location
Coimbatore
Print_ISBN
978-1-4799-6804-6
Type
conf
DOI
10.1109/ICCCI.2015.7218119
Filename
7218119
Link To Document