Title :
Improving join performance for skewed databases
Author :
Cutt, Bryce ; Lawrence, Ramon
Author_Institution :
Univ. of British Columbia Okanagan, Okanagan, BC
Abstract :
The largest queries in data warehouses and decision support systems use hybrid hash join to relate information in multiple tables. Hybrid hash join functions independently of the data distributions of the join relations. Real-world data sets are not uniformly distributed and often contain significant skew. Although partition skew has been studied for hash joins, no prior work has examined how exploiting data skew can improve performance. In this paper, we present histo join, a join algorithm that uses histograms to identify data skew and improve join performance. Experimental results show that for skewed data sets histo join performs significantly fewer I/O operations and is faster by 20 to 60% than hybrid hash join.
Keywords :
data warehouses; file organisation; query processing; data warehouse; decision support system; histojoin; hybrid hash join performance; query processing; skewed database; Cost benefit analysis; Cost function; Data warehouses; Database systems; Decision support systems; Frequency; Histograms; Partitioning algorithms; Performance analysis; Query processing; data warehouse; hash join; histogram; skew;
Conference_Titel :
Electrical and Computer Engineering, 2008. CCECE 2008. Canadian Conference on
Conference_Location :
Niagara Falls, ON
Print_ISBN :
978-1-4244-1642-4
Electronic_ISBN :
0840-7789
DOI :
10.1109/CCECE.2008.4564563