• DocumentCode
    2533455
  • Title

    Efficient Star Join for Column-oriented Data Store in the MapReduce Environment

  • Author

    Zhu, Haitong ; Zhou, Minqi ; Xia, Fan ; Zhou, Aoying

  • Author_Institution
    Inst. of Massive Comput., East China Normal Univ., Shanghai, China
  • fYear
    2011
  • fDate
    21-23 Oct. 2011
  • Firstpage
    13
  • Lastpage
    18
  • Abstract
    Map Reduce is a parallel computing paradigm that has gained a lot of attention from both industry and academia recent years. Unlike parallel DBMSs, with Map Reduce, it is easier for non-expert to develop scalable parallel programs for analytical applications over huge data sets across clusters of commodity machines. As the nature of scan-oriented processing, the performance of Map Reduce for relation operators can be enhanced dramatically since it is inevitably accessing lots of unnecessary data tuples, especially for table join operators. In this paper, we propose an efficient star join strategy called HdBmp join for column-oriented data store by using a three-level content aware index (i.e., HdBmp Index). Armed with this index, most of the unnecessary tuples in the join processing can be filtered out, and consequently result in immense reduction in both communication cost and execution time. Our extensive experimental studies confirm the efficiency, scalability and effectiveness of our new proposed join methods.
  • Keywords
    parallel databases; MapReduce environment; column-oriented data store; parallel DBMS; parallel computing paradigm; scan-oriented processing; Algorithm design and analysis; Benchmark testing; Data models; Distributed databases; Indexes; Memory; Scalability; HdBmp index; HdBmp join; column store; star join;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information Systems and Applications Conference (WISA), 2011 Eighth
  • Conference_Location
    Chongqing
  • Print_ISBN
    978-1-4577-1812-0
  • Type

    conf

  • DOI
    10.1109/WISA.2011.10
  • Filename
    6093595