Title :
Big Data Analysis Using Apache Hadoop
Author :
Manikandan, Shankar Ganesh ; Ravi, Siddarth
Author_Institution :
Dept. of Inf. Technol., Dhanalakshmi Coll. of Eng., Chennai, India
Abstract :
We live in on-demand, on-command Digital universe with data prolifering by Institutions, Individuals and Machines at a very high rate. This data is categories as "Big Data" due to its sheer Volume, Variety and Velocity. Most of this data is unstructured, quasi structured or semi structured and it is heterogeneous in nature. The volume and the heterogeneity of data with the speed it is generated, makes it difficult for the present computing infrastructure to manage Big Data. Traditional data management, warehousing and analysis systems fall short of tools to analyze this data. Due to its specific nature of Big Data, it is stored in distributed file system architectures. Hadoop and HDFS by Apache is widely used for storing and managing Big Data. Analyzing Big Data is a challenging task as it involves large distributed file systems which should be fault tolerant, flexible and scalable. Map Reduce is widely been used for the efficient analysis of Big Data. Traditional DBMS techniques like Joins and Indexing and other techniques like graph search is used for classification and clustering of Big Data. These techniques are being adopted to be used in Map Reduce. In this paper we suggest various methods for catering to the problems in hand through Map Reduce framework over Hadoop Distributed File System (HDFS). Map Reduce is a Minimization technique which makes use of file indexing with mapping, sorting, shuffling and finally reducing. Map Reduce techniques have been studied in this paper which is implemented for Big Data analysis using HDFS.
Keywords :
Big Data; distributed databases; network operating systems; pattern clustering; software fault tolerance; Apache Hadoop; DBMS techniques; HDFS; Hadoop distributed file system; Map Reduce framework; big data analysis; computing infrastructure; data analysis systems; data classification; data clustering; data heterogeneity; data management; data volume; data warehousing; fault tolerant; file indexing; graph search; joins; mapping; minimization technique; shuffling; sorting; Big data; Computer architecture; Distributed databases; Fault tolerance; Fault tolerant systems; File systems; Program processors;
Conference_Titel :
IT Convergence and Security (ICITCS), 2014 International Conference on
Conference_Location :
Beijing
DOI :
10.1109/ICITCS.2014.7021746