Title :
Towards Multi-way Join Evaluating with Indexing Partition Support in Map-Reduce
Author :
Yunpeng Li ; Wenhai Li ; Biren Chen ; Wei Song ; Weidong Wen ; Wanghong Li
Author_Institution :
State Key Lab. of Software Eng., Wuhan Univ., Wuhan, China
Abstract :
In the era of "big data", the emergence and increasing adoptions of the related enabling technologies make it possible for Map-Reduce to accommodate DSS (Decision Support Systems) load, which is commonly targeted for high-performance Data Warehouse analyses in the context of RDBMS. However, the non-predetermined mapping of the Map-Reduce tasks to the physical machines makes it difficult to utilize the pre-partitioned and indexing techniques of DBMS to improve the data locality. In this paper, towards multi-way join evaluating OLAP (Online Analysis Processing) workloads, we introduce table partitioning by reference to Map-Reduce. For avoiding the dispersion of the initial tuples that belong to the same segment keys, we present a detailed description of the data organization model that partitions the dominated tables by cascade reference constraints. In order to push multiple joins on these clustered partitions down to the map task, we design a one-pass multi-way join algorithm along with its optimization implementations for the major Map-Reduce stages. We conduct an empirically study with TPCH benchmark on different scales of clusters, and experimentally verify the high efficiency of the proposed optimization model.
Keywords :
Big Data; data mining; data warehouses; database indexing; decision support systems; parallel processing; relational databases; Big Data; DBMS indexing techniques; DBMS prepartitioned techniques; DSS load; MapReduce; OLAP workloads; RDBMS context; TPCH benchmark; cascade reference constraints; clustered partitions; data locality improvement; data organization model; decision support systems; dominated table partitioning; high-performance data warehouse analyses; indexing partition support; initial tuple dispersion avoidance; multiway join evaluation; one-pass multiway join algorithm; online analysis processing workloads; optimization model; Context; Data models; Indexing; Layout; Optimization; Organizations; Cascade reference; DBMS; Map-Reduce; Relational Partition; join index;
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2013 International Conference on
Conference_Location :
Seoul
DOI :
10.1109/ICPADS.2013.51