Title :
Optimizing ETL workflows for fault-tolerance
Author :
Simitsis, Alkis ; Wilkinson, Kevin ; Dayal, Umeshwar ; Castellanos, Malu
Author_Institution :
HP Labs., Palo Alto, CA, USA
Abstract :
Extract-Transform-Load (ETL) processes play an important role in data warehousing. Typically, design work on ETL has focused on performance as the sole metric to make sure that the ETL process finishes within an allocated time window. However, other quality metrics are also important and need to be considered during ETL design. In this paper, we address ETL design for performance plus fault-tolerance and freshness. There are many reasons why an ETL process can fail and a good design needs to guarantee that it can be recovered within the ETL time window. How to make ETL robust to failures is not trivial. There are different strategies that can be used and they each have different costs and benefits. In addition, other metrics can affect the choice of a strategy; e.g., higher freshness reduces the time window for recovery. The design space is too large for informal, ad-hoc approaches. In this paper, we describe our QoX optimizer that considers multiple design strategies and finds an ETL design that satisfies multiple objectives. In particular, we define the optimizer search space, cost functions, and search algorithms. Also, we illustrate its use through several experiments and we show that it produces designs that are very near optimal.
Keywords :
data warehouses; fault tolerant computing; matrix algebra; optimisation; ETL design; ETL workflows optimization; QoX optimizer; cost functions; data warehousing; extract-transform-load processes; fault tolerance; optimizer search space; quality metrics; search algorithms; sole metric; Availability; Cost function; Data mining; Data warehouses; Design optimization; Fault tolerance; Maintenance; Robustness; Scalability; Warehousing;
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
DOI :
10.1109/ICDE.2010.5447816