Title :
A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
Author :
Pal, Arnab ; Jain, Kunal ; Agrawal, Pulin ; Agrawal, Sanjay
Author_Institution :
Dept. of Comput. Eng. & Applic., Nat. Inst. of Tech. Teachers´ Training & Res., Bhopal, India
Abstract :
Big Data is a huge amount of data that cannot be managed by the traditional data management system. Hadoop is a technological answer to Big Data. Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of the big data. The Tera Bytes size file can be easily stored on the HDFS and can be analyzed with MapReduce. This paper provides introduction to Hadoop HDFS and MapReduce for storing large number of files and retrieve information from these files. In this paper we present our experimental work done on Hadoop by applying a number of files as input to the system and then analyzing the performance of the Hadoop system. We have studied the amount of bytes written and read by the system and by the MapReduce. We have analyzed the behavior of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.
Keywords :
Big Data; distributed databases; information retrieval; storage management; Hadoop HDFS; Hadoop distributed file system; MapReduce programming model; big data retrieval; big data storage; data management system; files dataset; information retrieval; performance analysis; Computers; Distributed databases; File systems; Google; Programming; Training; Data Node; HDFS; Hadoop; Job Tracker; MapReduce; Name Node; Secondary Name Node; Task Tracker; Teragen; Terasort; Teravalidate;
Conference_Titel :
Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on
Conference_Location :
Bhopal
Print_ISBN :
978-1-4799-3069-2
DOI :
10.1109/CSNT.2014.124