Big data emerging technologies: A CaseStudy with analyzing twitter data using apache hive

Author

Aditya Bhardwaj; Vanraj;Ankit Kumar;Yogendra Narayan;Pawan Kumar

Author_Institution

Computer Science & Engineering Department, National Institute of Technical Teachers Training and Research, Chandigarh, India

fYear

2015

Firstpage

1

Lastpage

6

Abstract

These are the days of Growth and Innovation for a better future. Now-a-days companies are bound to realize need of Big Data to make decision over complex problem. Big Data is a term that refers to collection of large datasets containing massive amount of data whose size is in the range of Petabytes, Zettabytes, or with high rate of growth, and complexity that make them difficult to process and analyze using conventional database technologies. Big Data is generated from various sources such as social networking sites like Facebook, Twitter etc, and the data that is generated can be in various formats like structured, semi-structured or unstructured format. For extracting valuable information from this huge amount of Data, new tools and techniques is a need of time for the organizations to derive business benefits and to gain competitive advantage over the market. In this paper a comprehensive study of major Big Data emerging technologies by highlighting their important features and how they work, with a comparative study between them is presented. This paper also represents performance analysis of Apache Hive query for executing Twitter tweets in order to calculate Map Reduce CPU time spent and total time taken to finish the job.

Keywords

"Big data","Twitter","File systems","Computer architecture","Google","Servers","Writing"

Publisher

ieee

Conference_Titel

Recent Advances in Engineering & Computational Sciences (RAECS), 2015 2nd International Conference on

Type

conf

DOI

10.1109/RAECS.2015.7453400

Filename

7453400