• DocumentCode
    1959673
  • Title

    Handling data skew in parallel hash join computation using two-phase scheduling

  • Author

    Zhou, Xiaofang ; Orlowska, Maria E.

  • Author_Institution
    Div. of Inf. Technol., CSIRO, Canberra, ACT, Australia
  • Volume
    2
  • fYear
    1995
  • fDate
    19-21 Apr 1995
  • Firstpage
    527
  • Abstract
    A large number of parallel join algorithms has been proposed to maintain load-balancing in the presence of data skew. However, one important type of data skew-join product skew (JPS)-has been little studied. In this paper, a dynamic parallel join algorithm, which employs a two-phase scheduling procedure, is designed to handle the JPS problem. Two sets of scheduling heuristics are studied against various parameters. It is shown that many of the existing algorithms can be regarded as a special case of our algorithm, whose cost is based on the nature of data skew. While it can cope with JPS which other algorithms cannot approach, it can be as efficient as most existing algorithms when JPS does not exist
  • Keywords
    parallel algorithms; processor scheduling; query processing; relational databases; resource allocation; data skew; dynamic parallel join algorithm; load-balancing; parallel hash join computation; two-phase scheduling; two-phase scheduling procedure; Algorithm design and analysis; Computer science; Concurrent computing; Dynamic scheduling; Government; Information technology; Parallel architectures; Processor scheduling; Relational databases; Scheduling algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Algorithms and Architectures for Parallel Processing, 1995. ICAPP 95. IEEE First ICA/sup 3/PP., IEEE First International Conference on
  • Conference_Location
    Brisbane, Qld.
  • Print_ISBN
    0-7803-2018-2
  • Type

    conf

  • DOI
    10.1109/ICAPP.1995.472237
  • Filename
    472237