• DocumentCode
    2710299
  • Title

    An effective algorithm for parallelizing sort merge joins in the presence of data skew

  • Author

    Wolf, Joel L. ; Dias, Daniel M. ; Yu, Philip S.

  • Author_Institution
    IBM Thomas J Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    1990
  • fDate
    2-4 Jul 1990
  • Firstpage
    103
  • Lastpage
    115
  • Abstract
    A parallel sort-merge-join algorithm that uses a divide-and-conquer approach to address the data skew problem is proposed. The algorithm adds an extra scheduling phase to the usual sort, transfer and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the largest skew elements and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution for data skew, the algorithm is shown to achieve very good load balancing for the join phase in a CPU-bound environment and to be very robust relative to the degree of data skew and the total number of processors
  • Keywords
    merging; parallel algorithms; relational databases; sorting; CPU-bound environment; Zipf-like distribution; data skew problem; divide-and-conquer approach; join phases; load balancing; multiple processors; parallel sort-merge-join algorithm; parallelizable optimization algorithm; robust; scheduling phase; sort phase; transfer; Delay; Load management; Parallel architectures; Parallel processing; Processor scheduling; Proposals; Prototypes; Relational databases; Robustness; Scheduling algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Databases in Parallel and Distributed Systems, 1990, Proceedings. Second International Symposium on
  • Conference_Location
    Dublin
  • Print_ISBN
    0-8186-2052-8
  • Type

    conf

  • DOI
    10.1109/DPDS.1990.113702
  • Filename
    113702