• DocumentCode
    659464
  • Title

    A distributed tree data structure for real-time OLAP on cloud architectures

  • Author

    Dehne, F. ; Kong, Qing-Jie ; Rau-Chaplin, Andrew ; Zaboli, H. ; Zhou, Rui

  • Author_Institution
    Sch. of Comput. Sci., Carleton Univ., Ottawa, ON, Canada
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    499
  • Lastpage
    505
  • Abstract
    In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as “report the total sales in all stores located in California and New York during the months February-May of all years”. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response.
  • Keywords
    cloud computing; data mining; data warehouses; multiprocessing systems; query processing; software architecture; tree data structures; Amazon EC2 cloud; CR-OLAP; California; New York; TPC-DS OLAP query stream; TPC-DS benchmark data set; cloud architectures; cloud infrastructure; cloud-based real-time OLAP system; data warehouses; database size; dimension hierarchies; distributed PDCR tree data structure; distributed index structure; multicore processors; multiple dimension hierarchies; online transaction processing systems; performance issues; performance maintenance; query latency; query processing; real-time OLAP query aggregation; Aggregates; Arrays; Data warehouses; Databases; Program processors; Real-time systems; Vegetation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691613
  • Filename
    6691613