• DocumentCode
    2009782
  • Title

    Towards Multi-way Join Evaluating with Indexing Partition Support in Map-Reduce

  • Author

    Yunpeng Li ; Wenhai Li ; Biren Chen ; Wei Song ; Weidong Wen ; Wanghong Li

  • Author_Institution
    State Key Lab. of Software Eng., Wuhan Univ., Wuhan, China
  • fYear
    2013
  • fDate
    15-18 Dec. 2013
  • Firstpage
    307
  • Lastpage
    314
  • Abstract
    In the era of "big data", the emergence and increasing adoptions of the related enabling technologies make it possible for Map-Reduce to accommodate DSS (Decision Support Systems) load, which is commonly targeted for high-performance Data Warehouse analyses in the context of RDBMS. However, the non-predetermined mapping of the Map-Reduce tasks to the physical machines makes it difficult to utilize the pre-partitioned and indexing techniques of DBMS to improve the data locality. In this paper, towards multi-way join evaluating OLAP (Online Analysis Processing) workloads, we introduce table partitioning by reference to Map-Reduce. For avoiding the dispersion of the initial tuples that belong to the same segment keys, we present a detailed description of the data organization model that partitions the dominated tables by cascade reference constraints. In order to push multiple joins on these clustered partitions down to the map task, we design a one-pass multi-way join algorithm along with its optimization implementations for the major Map-Reduce stages. We conduct an empirically study with TPCH benchmark on different scales of clusters, and experimentally verify the high efficiency of the proposed optimization model.
  • Keywords
    Big Data; data mining; data warehouses; database indexing; decision support systems; parallel processing; relational databases; Big Data; DBMS indexing techniques; DBMS prepartitioned techniques; DSS load; MapReduce; OLAP workloads; RDBMS context; TPCH benchmark; cascade reference constraints; clustered partitions; data locality improvement; data organization model; decision support systems; dominated table partitioning; high-performance data warehouse analyses; indexing partition support; initial tuple dispersion avoidance; multiway join evaluation; one-pass multiway join algorithm; online analysis processing workloads; optimization model; Context; Data models; Indexing; Layout; Optimization; Organizations; Cascade reference; DBMS; Map-Reduce; Relational Partition; join index;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems (ICPADS), 2013 International Conference on
  • Conference_Location
    Seoul
  • ISSN
    1521-9097
  • Type

    conf

  • DOI
    10.1109/ICPADS.2013.51
  • Filename
    6808188