Processing performance on Apache Pig, Apache Hive and MySQL cluster

Author

Fuad, Ammar ; Erwin, Alva ; Ipung, Heru Purnomo

Author_Institution

Inf. Technol., Swiss German Univ. Edutown BSD City, Tangerang, Indonesia

fYear

2014

fDate

24-24 Sept. 2014

Firstpage

297

Lastpage

302

Abstract

MySQL Cluster is a famous clustered database that is used to store and manipulate data. The problem with MySQL Cluster is that as the data grows larger, the time required to process the data increases and additional resources may be needed. With Hadoop and Hive and Pig, processing time can be faster than MySQL Cluster. In this paper, three data testers with the same data model will run simple queries and to find out at how many rows Hive or Pig is faster than MySQL Cluster. The data model taken from GroupLens Research Project [12] showed a result that Hive is the most appropriate for this data model in a low-cost hardware environment.

Keywords

SQL; data handling; Hadoop; Hive; MySQL cluster; Pig; apache hive; apache pig; clustered database; data model; grouplens research project; hardware environment; processing performance; Data models; Distributed databases; Educational institutions; Hardware; Motion pictures; Servers; Sorting; Hadoop; Hive; MySQL; MySQL Cluster; Pig; Processing big data;

fLanguage

English

Publisher

ieee

Conference_Titel

Information, Communication Technology and System (ICTS), 2014 International Conference on

Conference_Location

Surabaya

Print_ISBN

978-1-4799-6857-2

Type

conf

DOI

10.1109/ICTS.2014.7010600

Filename

7010600