DocumentCode
1924689
Title
Mastiff: A MapReduce-based System for Time-Based Big Data Analytics
Author
Guo, Sijie ; Xiong, Jin ; Wang, Weiping ; Lee, Rubao
Author_Institution
State Key Lab. of Comput. Archit., Inst. of Comput. Technol., Beijing, China
fYear
2012
fDate
24-28 Sept. 2012
Firstpage
72
Lastpage
80
Abstract
Existing MapReduce-based warehousing systems are not specially optimized for time-based big data analysis applications. Such applications have two characteristics: 1) data are continuously generated and are required to be stored persistently for a long period of time, 2) applications usually process data in some time period so that typical queries use time-related predicates. Time-based big data analytics requires both high data loading speed and high query execution performance. However, existing systems including current MapReduce-based solutions do not solve this problem well because the two requirements are contradictory. We have implemented a MapReduce-based system, called Mastiff, which provides a solution to achieve both high data loading speed and high query performance. Mastiff exploits a systematic combination of a column group store structure and a lightweight helper structure. Furthermore, Mastiff uses an optimized table scan method and a column-based query execution engine to boost query performance. Based on extensive experiments results with diverse workloads, we will show that Mastiff can significantly outperform existing systems including Hive, HadoopDB, and GridSQL.
Keywords
data analysis; data warehouses; distributed processing; query processing; storage management; GridSQL; HadoopDB; Hive; MapReduce-based warehousing systems; Mastiff; column group store structure; column-based query execution engine; data continuous generation; data processing; data storage; helper structure; high data loading speed; high query execution performance; optimized table scan method; query performance; time-based big data analytics; time-related predicate query; Data handling; Data storage systems; Engines; Indexes; Information management; Loading; Servers; MapReduce; time-based data analytics;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing (CLUSTER), 2012 IEEE International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4673-2422-9
Type
conf
DOI
10.1109/CLUSTER.2012.10
Filename
6337767
Link To Document