DocumentCode :
2484212
Title :
A partition-based approach to support streaming updates over persistent data in an active datawarehouse
Author :
Chakraborty, Abhirup ; Singh, Ajit
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
fYear :
2009
fDate :
23-29 May 2009
Firstpage :
1
Lastpage :
11
Abstract :
Active warehousing has emerged in order to meet the high user demands for fresh and up-to-date information. Online refreshment of the source updates introduces processing and disk overheads in the implementation of the warehouse transformations. This paper considers a frequently occurring operator in active warehousing which computes the join between a fast, time varying or bursty update stream S and a persistent disk relation R, using a limited memory. Such a join operation is the crux of a number of common transformations (e.g., surrogate key assignment, duplicate detection etc) in an active data warehouse. We propose a partition-based join algorithm that minimizes the processing overhead, disk overhead and the delay in output tuples. The proposed algorithm exploits the spatio-temporal locality within the update stream, and improves the delays in output tuples by exploiting hot-spots in the range or domain of the joining attributes, and at the same time shares the I/O cost of accessing disk data of relation R over a volume of tuples from update stream S. We present experimental results showing the effectiveness of the proposed algorithm.
Keywords :
active databases; data handling; data warehouses; active data warehouse; active warehousing; bursty update stream; disk overhead; online refreshment; output tuple delay; partition-based join algorithm; persistent data; persistent disk relation; processing overhead; source updates; spatio-temporal locality; time varying stream; warehouse transformations; Costs; Data mining; Data warehouses; Delay effects; Partitioning algorithms; Pipelines; Table lookup; Warehousing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on
Conference_Location :
Rome
ISSN :
1530-2075
Print_ISBN :
978-1-4244-3751-1
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2009.5161064
Filename :
5161064
Link To Document :
بازگشت