DocumentCode :
659464
Title :
A distributed tree data structure for real-time OLAP on cloud architectures
Author :
Dehne, F. ; Kong, Qing-Jie ; Rau-Chaplin, Andrew ; Zaboli, H. ; Zhou, Rui
Author_Institution :
Sch. of Comput. Sci., Carleton Univ., Ottawa, ON, Canada
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
499
Lastpage :
505
Abstract :
In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as “report the total sales in all stores located in California and New York during the months February-May of all years”. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response.
Keywords :
cloud computing; data mining; data warehouses; multiprocessing systems; query processing; software architecture; tree data structures; Amazon EC2 cloud; CR-OLAP; California; New York; TPC-DS OLAP query stream; TPC-DS benchmark data set; cloud architectures; cloud infrastructure; cloud-based real-time OLAP system; data warehouses; database size; dimension hierarchies; distributed PDCR tree data structure; distributed index structure; multicore processors; multiple dimension hierarchies; online transaction processing systems; performance issues; performance maintenance; query latency; query processing; real-time OLAP query aggregation; Aggregates; Arrays; Data warehouses; Databases; Program processors; Real-time systems; Vegetation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691613
Filename :
6691613
Link To Document :
بازگشت