• DocumentCode
    1203420
  • Title

    New algorithms for parallelizing relational database joins in the presence of data skew

  • Author

    Wolf, Joel L. ; Dias, Daniel M. ; Yu, Philip S. ; Turek, John

  • Author_Institution
    Res. Div., IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    6
  • Issue
    6
  • fYear
    1994
  • fDate
    12/1/1994 12:00:00 AM
  • Firstpage
    990
  • Lastpage
    997
  • Abstract
    Parallel processing is an attractive option for relational database systems. As in any parallel environment however, load balancing is a critical issue which affects overall performance. Load balancing for one common database operation in particular, the join of two relations, can be severely hampered for conventional parallel algorithms, due to a natural phenomenon known as data skew. In a pair of recent papers (J. Wolf et al., 1993; 1993), we described two new join algorithms designed to address the data skew problem. We propose significant improvements to both algorithms, increasing their effectiveness while simultaneously decreasing their execution times. The paper then focuses on the comparative performance of the improved algorithms and their more conventional counterparts. The new algorithms outperform their more conventional counterparts in the presence of just about any skew at all, dramatically so in cases of high skew
  • Keywords
    parallel algorithms; parallel programming; relational algebra; relational databases; resource allocation; common database operation; comparative performance; data skew; join algorithms; load balancing; parallel algorithms; parallel environment; parallel processing; relational database joins; relational database systems; Algorithm design and analysis; Design optimization; Load management; Parallel algorithms; Parallel architectures; Parallel processing; Partitioning algorithms; Processor scheduling; Relational databases; Scheduling algorithm;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.334888
  • Filename
    334888