• DocumentCode
    3264545
  • Title

    Building large ROLAP data cubes in parallel

  • Author

    Chen, Ying ; Dehne, Frank ; Eavis, Todd ; Rau-Chaplin, A.

  • Author_Institution
    Dalhousie Univ., Halifax, NS, Canada
  • fYear
    2004
  • fDate
    7-9 July 2004
  • Firstpage
    367
  • Lastpage
    377
  • Abstract
    The pre-computation of data cubes is critical to improving the response time of on-line analytical processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. However, as the size of data warehouses grows, the time it takes to perform this pre-computation becomes a significant performance bottleneck. This work presents a fast parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data partitioning technique. Since no shared disk is required, this method can be applied on highly scalable processor clusters consisting of standard PCs with local disks, connected via a data switch. The approach taken, which uses a ROLAP representation of the data cube, is well suited to large data warehouses on high dimensional data, and supports the generation of both fully materialized and partially materialized cubes. In comparison with previous approaches, our new method does significantly improve the scalability with respect to both, the number of processors and the I/O bandwidth (number of parallel disks). We have implemented our new parallel shared-nothing data cube generation method and evaluated it on a PC cluster, exploring relative speedup, scaleup, sizeup, output sizes and data skew. For a fact table with 16 million rows and 8 attributes, our parallel data cube generation method achieves close to optimal speedup for as many as 32 processors, generating a full data cube in under 7 minutes. For a fact table with 256 million rows and 8 attributes, our parallel method achieves optimal speedup for 32 processors, generating a full data cube consisting of ≈7 billion rows (200 Gigabytes) in under 88 minutes.
  • Keywords
    data mining; data warehouses; parallel processing; OLAP systems; ROLAP data cubes; data partitioning; data warehouses; fast parallel method; online analytical processing; parallel shared-nothing data cube generation; processor clusters; shared-nothing multiprocessor; Acceleration; Bandwidth; Data mining; Data warehouses; Delay; Instruments; Optimization methods; Personal communication networks; Scalability; Switches;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Engineering and Applications Symposium, 2004. IDEAS '04. Proceedings. International
  • ISSN
    1098-8068
  • Print_ISBN
    0-7695-2168-1
  • Type

    conf

  • DOI
    10.1109/IDEAS.2004.1319810
  • Filename
    1319810