Title :
A parallel scalable infrastructure for OLAP and data mining
Author :
Goil, Sanjay ; Choudhary, Alok
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL, USA
Abstract :
Decision support systems are important in leveraging information present in data warehouses in businesses like banking, insurance, retail and health care. The multidimensional aspects of a business can be naturally expressed using a multidimensional data model. Data analysis and data mining on these warehouses pose new challenges for traditional database systems. OLAP and data mining operations require summary information on these multidimensional data sets. Query processing for these applications require different views of data for analysis and effective decision making. Data mining techniques can be applied in conjunction with OLAP for an integrated business solution. As data warehouses grow, parallel processing techniques have been applied to enable the use of larger data sets and reduce the time for analysis, thereby enabling evaluation of many more options for decision making. We address: (1) scalability in multidimensional systems for OLAP and multidimensional analysis; (2) integration of data mining with the OLAP framework; and (3) high performance by using parallel processing for OLAP and data mining. We describe our system PARSIMONY-Parallel and Scalable Infrastructure for Multidimensional Online analytical processing. This platform is used both for OLAP and data mining. Sparsity of data sets is handled by using sparse chunks using a bit encoded sparse structure for compression. Techniques for effectively using summary information available in data cubes for data mining are presented for mining association rules and decision tree based classification. These take advantage of the data organization provided by the multidimensional data model. Performance results for high dimensional data sets on a distributed memory parallel machine (IBM SP-2) show good speedup and scalability
Keywords :
business data processing; data mining; data models; data warehouses; decision support systems; parallel databases; parallel programming; query processing; IBM SP-2; OLAP; PARSIMONY; Parallel and Scalable Infrastructure for Multidimensional Online analytical processing; bit encoded sparse structure; business computing; data analysis; data compression; data mining; data organization; data warehouses; decision support systems; decision tree based classification; distributed memory parallel machine; high dimensional data sets; integrated business solution; larger data sets; multidimensional analysis; multidimensional data model; multidimensional data sets; multidimensional systems; parallel processing techniques; parallel scalable infrastructure; query processing; sparse chunks; summary information; Banking; Data analysis; Data mining; Data models; Data warehouses; Decision making; Decision support systems; Multidimensional systems; Parallel processing; Scalability;
Conference_Titel :
Database Engineering and Applications, 1999. IDEAS '99. International Symposium Proceedings
Conference_Location :
Montreal, Que.
Print_ISBN :
0-7695-0265-2
DOI :
10.1109/IDEAS.1999.787266