Title :
Data skew and the scalability of parallel joins
Author :
Walton, Christopher B. ; Dale, Alfred G.
Author_Institution :
Dept. of Comput. Sci., Texas Univ., Austin, TX, USA
Abstract :
When data are uniformly distributed, parallel join algorithms scale up well. However, scalability is curtailed by data skew-nonuniform distribution of data between processors. Investigation of this problem has been hampered by incomplete understanding of data skew as well as inadequate analytic performance models. The authors use a new model of data skew that addresses these shortcomings to examine the effects of skewed workloads on the scalability of the hybrid hash, scheduling hash, and sort-merge parallel join algorithms. Results indicate that the extent to which data skew degrades scalability varies with the join algorithm, the workload and the type of data skew. None of the three algorithms has the best scalability and response time in all cases
Keywords :
parallel algorithms; performance evaluation; relational databases; analytic performance models; data skew; hybrid hash; nonuniform distribution; parallel joins; relational processing; response time; scalability; scheduling hash; skewed workloads; sort-merge parallel join algorithms; Concurrent computing; Degradation; Delay; Distributed computing; Partitioning algorithms; Performance analysis; Processor scheduling; Relational databases; Scalability; Taxonomy;
Conference_Titel :
Parallel and Distributed Processing, 1991. Proceedings of the Third IEEE Symposium on
Conference_Location :
Dallas, TX
Print_ISBN :
0-8186-2310-1
DOI :
10.1109/SPDP.1991.218298