DocumentCode :
1737111
Title :
HAR+: Archive and metadata distribution! Why not both?
Author :
Dev, Dipayan ; Patgiri, Ripon
Author_Institution :
Dept. of Comput. Sci. & Eng., Nat. Inst. of Technol., Silchar, India
fYear :
2015
Firstpage :
1
Lastpage :
6
Abstract :
Size of the data used in today´s enterprises has been expanding at a huge range from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. Hadoop Distributed File System (HDFS), is an open source implementation of Apache, designed for running on commodity hardware to handle applications having large datasets (TB, PB). HDFS architecture is based on single master (Name Node), which handles the metadata for large number of slaves. To get maximum efficiency, Name Node stores all of the metadata in its RAM. So, when dealing with huge number of small files, Name Node often becomes a bottleneck for HDFS as it might run out of memory. Apache Hadoop uses Hadoop ARchive (HAR) to deal with small files. But it is not so efficient for multi-NameNode environment, which requires automatic scaling of metadata. In this paper, we have designed hashtable based architecture, Hadoop ARchive Plus (HAR+) using sha256 as the key, which is a modification of existing HAR. HAR+ is designed to provide more reliability which can also provide auto scaling of metadata. Instead of using one NameNode for storing the metadata, HAR+ uses multiple NameNodes. Our result shows that HAR+ reduces the load of a single NameNode in significant amount. This makes the cluster more scalable, more robust and less prone to failure unlike of Hadoop ARchive.
Keywords :
data handling; meta data; network operating systems; parallel architectures; public domain software; random-access storage; reliability; Apache Hadoop ARchive Plus; HAR+; HDFS architecture; Hadoop distributed file system; RAM; metadata distribution; multiple NameNode environment; open source; reliability; sha256; Computer architecture; Computers; File systems; Indexes; Random access memory; Reliability; Big Data; HAR; HDFS; Hadoop; Metadata; Small files;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Communication and Informatics (ICCCI), 2015 International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4799-6804-6
Type :
conf
DOI :
10.1109/ICCCI.2015.7218119
Filename :
7218119
Link To Document :
بازگشت