Parallel implementations of exclusion joins

Author

Shum, Chung-Dak

Author_Institution

Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol., Kowloon, Hong Kong

fYear

1993

fDate

1-4 Dec 1993

Firstpage

742

Lastpage

747

Abstract

This paper examines the parallel processing of exclusion join in a shared-nothing multiprocessor environment. First, a parallel hash-based exclusion join algorithm is presented. Unlike the case of equijoin, this algorithm does not work correctly in the presence of nulls in the join attributes. One solution is to restrict the hash-on attributes to non-nullable fields. However, this can lead to the well known data skew problem. If the number of tuples containing null values in their join attributes is small, an alternative is to replicate those tuples to all processors. Otherwise, we can consider a range partitioning algorithm where those tuples are only sent to a small subset of the processors. The hash-based algorithm usually outperforms the range partitioning algorithm except when the number of tuples containing null values in their join attributes is large or when the data is highly skewed

Keywords

database theory; distributed databases; parallel algorithms; query processing; data skew problem; exclusion join; hash-based algorithm; hash-on attributes; parallel hash-based exclusion join algorithm; parallel processing; range partitioning algorithm; shared-nothing multiprocessor environment; Computer science; Database machines; Joining IEEE; Parallel processing; Partitioning algorithms;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing, 1993. Proceedings of the Fifth IEEE Symposium on

Conference_Location

Dallas, TX

Print_ISBN

0-8186-4222-X

Type

conf

DOI

10.1109/SPDP.1993.395458

Filename

395458