DocumentCode
1968546
Title
Leveraging 24/7 Availability and Performance for Distributed Real-Time Data Warehouses
Author
Santos, Ricardo Jorge ; Bernardino, Jorge ; Vieira, Marco
Author_Institution
DEI, Univ. of Coimbra, Coimbra, Portugal
fYear
2012
fDate
16-20 July 2012
Firstpage
654
Lastpage
659
Abstract
Real-time Data Warehouses (DWs) must be able to deal with continuous updates while ensuring 24/7 availability. To improve their performance, distributing data using round-robin algorithms on clusters of shared-nothing machines is normally used. This paper proposes a solution for distributed DW databases that ensures its continuous availability and deals with frequent data loading requirements, while adding small performance overhead. We use a data striping and replication architecture to distribute portions of each fact table among pairs of slave nodes, where each slave node is an exact replica of its partner. This allows balancing query execution and replacing any defective node, ensuring the system´s continuous availability. The size of each portion in a given node depends on its individual features, namely performance benchmark measures and dedicated database RAM. The estimated cost for executing each query workload in each slave node is also used for balancing query performance. We include experiments using the TPC-H decision support benchmark to evaluate the scalability of the proposed solution and show that it outperforms standard round-robin distributed DW setups.
Keywords
data warehouses; decision support systems; distributed databases; query processing; resource allocation; 24-7 availability; TPC-H decision support benchmark; continuous availability; continuous updates; data loading requirements; data striping; dedicated database RAM; distributed DW databases; distributed real-time data warehouses; fact table; performance benchmark measures; query execution balancing; query performance balancing; query workload; replication architecture; round-robin algorithms; shared-nothing machine cluster; slave nodes; Availability; Distributed databases; Hardware; Loading; Random access memory; Real-time systems; Real-time data warehousing; availability; data replication and redundancy; distributed and parallel databases; fault tolerance; load balancing; performance optimization;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Software and Applications Conference (COMPSAC), 2012 IEEE 36th Annual
Conference_Location
Izmir
ISSN
0730-3157
Print_ISBN
978-1-4673-1990-4
Electronic_ISBN
0730-3157
Type
conf
DOI
10.1109/COMPSAC.2012.92
Filename
6340224
Link To Document