مرکز منطقه ای اطلاع رساني علوم و فناوري - Efficient and Self-Balanced ROLLUP Aggregates for Large-Scale Data Summarization

DocumentCode :

1665420

Title :

Efficient and Self-Balanced ROLLUP Aggregates for Large-Scale Data Summarization

Author :

Duy-Hung Phan ; Quang-Nhat Hoang-Xuan ; Dell´Amico, Matteo ; Michiardi, Pietro

Author_Institution :

EURECOM, France

fYear :

2015

Firstpage :

158

Lastpage :

165

Abstract :

Data summarization queries that compute aggregates by grouping datasets across several dimensions are essential to help users make sense of very large datasets. In this work, we focus on ROLLUP, an important operator that has been recently added to the Hadoop MapReduce ecosystem. However, its current implementation suffers from very large communication costs, leading to inefficient executions. We thus proceed with the design of a new ROLLUP operator for high-level languages. Our operator is self-optimizing, which means that it automatically performs load-balancing and determines a suitable operating point to achieve the highest performance. We have implemented our ROLLUP operator for Apache Pig, a popular high-level language in the Hadoop ecosystem. Our experimental results, obtained on both synthetic and real datasets, indicate that our new operator outperforms the current ROLLUP implementation in Pig by at least 50%.

Keywords :

data handling; parallel processing; resource allocation; Apache Pig; Hadoop MapReduce ecosystem; ROLLUP operator; communication cost; data summarization queries; high-level language; large-scale data summarization; load balancing; self-balanced ROLLUP aggregates; self-optimizing operator; Aggregates; Algorithm design and analysis; Clustering algorithms; Load modeling; Partitioning algorithms; Runtime; Tuning; MapReduce; ROLLUP; data summarization; optimization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data (BigData Congress), 2015 IEEE International Congress on

Conference_Location :

New York, NY

Print_ISBN :

978-1-4673-7277-0

Type :

conf

DOI :

10.1109/BigDataCongress.2015.31

Filename :

7207215

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1665420