• DocumentCode
    2492407
  • Title

    ParaCube: A Scalable OLAP Model Based on Distributed Aggregate Computing with Sibling Cubes

  • Author

    Zhang, Yansong ; Wang, Shan ; Huang, Wei

  • Author_Institution
    Key Lab. of the Minist. of Educ. for Data Eng. & Knowledge Eng., Beijing, China
  • fYear
    2010
  • fDate
    6-8 April 2010
  • Firstpage
    323
  • Lastpage
    329
  • Abstract
    The requirements of OLAP applications increase rapidly by dramatically increased data volume, users, query volume and query complexity. The requirement for shortening update period in data warehouse is another crucial factor for a scalable OLAP application. In this paper, we propose a scalable OLAP prototype to support the query processing with increasing data volume by distributing the whole fact tuples to multiple servers to construct a set of sibling cubes which can be merged together to obtain the whole cube. We employ a light weight distribution policy with fully duplicated dimension tables in each sibling server on the observation of very low proportion of space cost for dimension tables. OLAP query with distributed aggregate functions can be transformed into queries to be performed parallel in sibling servers. For non-distributed computing aggregate functions, such as median, the optimized median aggregate computing algorithm is proposed to reduce transmission volume between servers while computing the global median values. We also present a three-level framework in data warehouse to meet the requirement of shorter update period in "operational business intelligence". An asynchronous tunnel model is proposed to reduce update latency by pre-fetching updated tuples to OLAP processing server. Finally, we set up prototype system ParaCube to evaluate performance in SN (shared-nothing) system and multi-core platforms.
  • Keywords
    data mining; data warehouses; distributed processing; query processing; ParaCube; asynchronous tunnel model; data volume; data warehouse; distributed aggregate computing; multicore platforms; nondistributed computing aggregate functions; operational business intelligence; optimized median aggregate computing algorithm; query complexity; query processing; query volume; scalable OLAP model; shared-nothing system; sibling cubes; Acceleration; Aggregates; Application software; Concurrent computing; Data warehouses; Distributed computing; Material storage; Merging; Prototypes; Query processing; ParaCube; distributed aggregate; median; sibling cube;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Conference (APWEB), 2010 12th International Asia-Pacific
  • Conference_Location
    Busan
  • Print_ISBN
    978-1-7695-4012-2
  • Electronic_ISBN
    978-1-4244-6600-9
  • Type

    conf

  • DOI
    10.1109/APWeb.2010.31
  • Filename
    5474121