Title :
DWS-AQA: a cost effective approach for very large data warehouses
Author :
Bernardino, Jorge ; Furtado, Pedro ; Madeira, Henrique
Author_Institution :
ISEC - DEIS, Inst. Polytech.of Coimbra, Portugal
Abstract :
Data warehousing applications typically involve massive amounts of data that push database management technology to the limit. A scalable architecture is crucial, not only to handle very large amount of data but also to assure interactive response time to the users. Large data warehouses require a very expensive setup, typically based on high-end servers or high-performance clusters. In this paper we propose and evaluate a simple but very effective method to implement a data warehouse using the computers and workstations typically available in large organizations. The proposed approach is called data warehouse striping with approximate query answering (DWS-AQA). The goal is to use the processing and disk capacity normally available in large workstation networks to implement a data warehouse with a very reduced infrastructure cost. As the data warehouse shares computers that are also being used for other purposes, most of the times only a fraction of the computers will be able to execute the partial queries in time. However, as we show in the paper, the approximated answers estimated from partial results have a very small error for most of the plausible scenarios. Moreover, as the data warehouse facts are partitioned in a strict uniform way, it is possible to calculate tight confidence intervals for the approximated answers, providing the user with a measure of the accuracy of the query results. A set of experiments on the TPC-H benchmark database is presented to show the accuracy of DWS-AQA for a large number of scenarios.
Keywords :
data warehouses; query processing; workstation clusters; DWS-AQA; data warehouse striping with approximate query answering; disk capacity; infrastructure cost; interactive response time; large workstation networks; partial queries; processing capacity; scalable architecture; tight confidence intervals; very large data warehouses; Computer networks; Concurrent computing; Costs; Data warehouses; Databases; Delay; Technology management; Time sharing computer systems; Warehousing; Workstations;
Conference_Titel :
Database Engineering and Applications Symposium, 2002. Proceedings. International
Print_ISBN :
0-7695-1638-6
DOI :
10.1109/IDEAS.2002.1029676