Title :
High Efficiency of Hybrid Resumption in Distributed Data Warehouses
Author :
Gorawski, Marcin ; Marks, Pawel
Author_Institution :
Inst. of Comput. Sci., Silesian Univ. of Technol., Gliwice
Abstract :
ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we present a modified Design-Resume algorithm enriched by the possibility of handling ETL processes containing many loading nodes. We use the DR algorithm to resume a distributed data warehouse load process. The key feature of this algorithm is that it does not impose additional overhead on the normal ETL process. In our work we modify the algorithm to work with more than one loading node, and combine it with staging technique, which increases the efficiency of the resumption process. The combined algorithm, we name it hybrid resumption algorithm. Based on the results of performed tests, the benefits of our improvements are discussed
Keywords :
data warehouses; distributed databases; ETL processes; distributed data warehouse load process; hybrid resumption algorithm; interrupted extraction resumption algorithms; modified Design-Resume algorithm; Algorithm design and analysis; Checkpointing; Computer science; Data mining; Data warehouses; Hardware; Java; Performance evaluation; Resumes; Testing;
Conference_Titel :
Database and Expert Systems Applications, 2005. Proceedings. Sixteenth International Workshop on
Conference_Location :
Copenhagen
Print_ISBN :
0-7695-2424-9
DOI :
10.1109/DEXA.2005.108