DocumentCode
1999295
Title
High Efficiency of Hybrid Resumption in Distributed Data Warehouses
Author
Gorawski, Marcin ; Marks, Pawel
Author_Institution
Inst. of Comput. Sci., Silesian Univ. of Technol., Gliwice
fYear
2005
fDate
26-26 Aug. 2005
Firstpage
323
Lastpage
327
Abstract
ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we present a modified Design-Resume algorithm enriched by the possibility of handling ETL processes containing many loading nodes. We use the DR algorithm to resume a distributed data warehouse load process. The key feature of this algorithm is that it does not impose additional overhead on the normal ETL process. In our work we modify the algorithm to work with more than one loading node, and combine it with staging technique, which increases the efficiency of the resumption process. The combined algorithm, we name it hybrid resumption algorithm. Based on the results of performed tests, the benefits of our improvements are discussed
Keywords
data warehouses; distributed databases; ETL processes; distributed data warehouse load process; hybrid resumption algorithm; interrupted extraction resumption algorithms; modified Design-Resume algorithm; Algorithm design and analysis; Checkpointing; Computer science; Data mining; Data warehouses; Hardware; Java; Performance evaluation; Resumes; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications, 2005. Proceedings. Sixteenth International Workshop on
Conference_Location
Copenhagen
ISSN
1529-4188
Print_ISBN
0-7695-2424-9
Type
conf
DOI
10.1109/DEXA.2005.108
Filename
1508293
Link To Document