• DocumentCode
    1999295
  • Title

    High Efficiency of Hybrid Resumption in Distributed Data Warehouses

  • Author

    Gorawski, Marcin ; Marks, Pawel

  • Author_Institution
    Inst. of Comput. Sci., Silesian Univ. of Technol., Gliwice
  • fYear
    2005
  • fDate
    26-26 Aug. 2005
  • Firstpage
    323
  • Lastpage
    327
  • Abstract
    ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we present a modified Design-Resume algorithm enriched by the possibility of handling ETL processes containing many loading nodes. We use the DR algorithm to resume a distributed data warehouse load process. The key feature of this algorithm is that it does not impose additional overhead on the normal ETL process. In our work we modify the algorithm to work with more than one loading node, and combine it with staging technique, which increases the efficiency of the resumption process. The combined algorithm, we name it hybrid resumption algorithm. Based on the results of performed tests, the benefits of our improvements are discussed
  • Keywords
    data warehouses; distributed databases; ETL processes; distributed data warehouse load process; hybrid resumption algorithm; interrupted extraction resumption algorithms; modified Design-Resume algorithm; Algorithm design and analysis; Checkpointing; Computer science; Data mining; Data warehouses; Hardware; Java; Performance evaluation; Resumes; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 2005. Proceedings. Sixteenth International Workshop on
  • Conference_Location
    Copenhagen
  • ISSN
    1529-4188
  • Print_ISBN
    0-7695-2424-9
  • Type

    conf

  • DOI
    10.1109/DEXA.2005.108
  • Filename
    1508293