DocumentCode
2009782
Title
Towards Multi-way Join Evaluating with Indexing Partition Support in Map-Reduce
Author
Yunpeng Li ; Wenhai Li ; Biren Chen ; Wei Song ; Weidong Wen ; Wanghong Li
Author_Institution
State Key Lab. of Software Eng., Wuhan Univ., Wuhan, China
fYear
2013
fDate
15-18 Dec. 2013
Firstpage
307
Lastpage
314
Abstract
In the era of "big data", the emergence and increasing adoptions of the related enabling technologies make it possible for Map-Reduce to accommodate DSS (Decision Support Systems) load, which is commonly targeted for high-performance Data Warehouse analyses in the context of RDBMS. However, the non-predetermined mapping of the Map-Reduce tasks to the physical machines makes it difficult to utilize the pre-partitioned and indexing techniques of DBMS to improve the data locality. In this paper, towards multi-way join evaluating OLAP (Online Analysis Processing) workloads, we introduce table partitioning by reference to Map-Reduce. For avoiding the dispersion of the initial tuples that belong to the same segment keys, we present a detailed description of the data organization model that partitions the dominated tables by cascade reference constraints. In order to push multiple joins on these clustered partitions down to the map task, we design a one-pass multi-way join algorithm along with its optimization implementations for the major Map-Reduce stages. We conduct an empirically study with TPCH benchmark on different scales of clusters, and experimentally verify the high efficiency of the proposed optimization model.
Keywords
Big Data; data mining; data warehouses; database indexing; decision support systems; parallel processing; relational databases; Big Data; DBMS indexing techniques; DBMS prepartitioned techniques; DSS load; MapReduce; OLAP workloads; RDBMS context; TPCH benchmark; cascade reference constraints; clustered partitions; data locality improvement; data organization model; decision support systems; dominated table partitioning; high-performance data warehouse analyses; indexing partition support; initial tuple dispersion avoidance; multiway join evaluation; one-pass multiway join algorithm; online analysis processing workloads; optimization model; Context; Data models; Indexing; Layout; Optimization; Organizations; Cascade reference; DBMS; Map-Reduce; Relational Partition; join index;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Systems (ICPADS), 2013 International Conference on
Conference_Location
Seoul
ISSN
1521-9097
Type
conf
DOI
10.1109/ICPADS.2013.51
Filename
6808188
Link To Document