Efficient, Chunk-Replicated Node Partitioned Data Warehouses

Author

Furtado, Pedro

fYear

2008

fDate

10-12 Dec. 2008

Firstpage

578

Lastpage

583

Abstract

Much has been said about processing efficiently data in parallel database servers, and some data warehouse applications must process in the order of tens to hundreds of Gigabytes efficiently. Yet, there is no effective approach targeted at using non-dedicated low-cost platforms efficiently in this context. Imagine taking together 10 or 1000 commodity PCs and setting-up a data crunching platform for large database-resident data with acceptable performance. There are significant inter-related data layout and processing challenges when the computational, storage and network hardware are heterogeneous and slow. We propose how to place, replicate and load-balance the data efficiently in this context. This work innovates in several respects: being practically as fast as full-mirroring without its overhead, exploring schema, chunk-wise placement, replication and load-balanced processing to be faster and more flexible than previous efforts. Our findings are complemented by an evaluation using TPC-H performance benchmark queries.

Keywords

data warehouses; parallel databases; TPC-H performance benchmark queries; chunk-replicated node partitioned data warehouses; data layout; data processing; load-balanced processing; parallel database servers; Computer networks; Data warehouses; Distributed databases; Distributed processing; Hardware; Image databases; Parallel processing; Personal communication networks; Relational databases; Switches; parallel databases; performance;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing with Applications, 2008. ISPA '08. International Symposium on

Conference_Location

Sydney, NSW

Print_ISBN

978-0-7695-3471-8

Type

conf

DOI

10.1109/ISPA.2008.86

Filename

4725197