Title :
Leveraging 24/7 Availability and Performance for Distributed Real-Time Data Warehouses
Author :
Santos, Ricardo Jorge ; Bernardino, Jorge ; Vieira, Marco
Author_Institution :
DEI, Univ. of Coimbra, Coimbra, Portugal
Abstract :
Real-time Data Warehouses (DWs) must be able to deal with continuous updates while ensuring 24/7 availability. To improve their performance, distributing data using round-robin algorithms on clusters of shared-nothing machines is normally used. This paper proposes a solution for distributed DW databases that ensures its continuous availability and deals with frequent data loading requirements, while adding small performance overhead. We use a data striping and replication architecture to distribute portions of each fact table among pairs of slave nodes, where each slave node is an exact replica of its partner. This allows balancing query execution and replacing any defective node, ensuring the system´s continuous availability. The size of each portion in a given node depends on its individual features, namely performance benchmark measures and dedicated database RAM. The estimated cost for executing each query workload in each slave node is also used for balancing query performance. We include experiments using the TPC-H decision support benchmark to evaluate the scalability of the proposed solution and show that it outperforms standard round-robin distributed DW setups.
Keywords :
data warehouses; decision support systems; distributed databases; query processing; resource allocation; 24-7 availability; TPC-H decision support benchmark; continuous availability; continuous updates; data loading requirements; data striping; dedicated database RAM; distributed DW databases; distributed real-time data warehouses; fact table; performance benchmark measures; query execution balancing; query performance balancing; query workload; replication architecture; round-robin algorithms; shared-nothing machine cluster; slave nodes; Availability; Distributed databases; Hardware; Loading; Random access memory; Real-time systems; Real-time data warehousing; availability; data replication and redundancy; distributed and parallel databases; fault tolerance; load balancing; performance optimization;
Conference_Titel :
Computer Software and Applications Conference (COMPSAC), 2012 IEEE 36th Annual
Conference_Location :
Izmir
Print_ISBN :
978-1-4673-1990-4
Electronic_ISBN :
0730-3157
DOI :
10.1109/COMPSAC.2012.92