DocumentCode :
2129906
Title :
Improving join performance for skewed databases
Author :
Cutt, Bryce ; Lawrence, Ramon
Author_Institution :
Univ. of British Columbia Okanagan, Okanagan, BC
fYear :
2008
fDate :
4-7 May 2008
Abstract :
The largest queries in data warehouses and decision support systems use hybrid hash join to relate information in multiple tables. Hybrid hash join functions independently of the data distributions of the join relations. Real-world data sets are not uniformly distributed and often contain significant skew. Although partition skew has been studied for hash joins, no prior work has examined how exploiting data skew can improve performance. In this paper, we present histo join, a join algorithm that uses histograms to identify data skew and improve join performance. Experimental results show that for skewed data sets histo join performs significantly fewer I/O operations and is faster by 20 to 60% than hybrid hash join.
Keywords :
data warehouses; file organisation; query processing; data warehouse; decision support system; histojoin; hybrid hash join performance; query processing; skewed database; Cost benefit analysis; Cost function; Data warehouses; Database systems; Decision support systems; Frequency; Histograms; Partitioning algorithms; Performance analysis; Query processing; data warehouse; hash join; histogram; skew;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering, 2008. CCECE 2008. Canadian Conference on
Conference_Location :
Niagara Falls, ON
ISSN :
0840-7789
Print_ISBN :
978-1-4244-1642-4
Electronic_ISBN :
0840-7789
Type :
conf
DOI :
10.1109/CCECE.2008.4564563
Filename :
4564563
Link To Document :
بازگشت