مرکز منطقه ای اطلاع رساني علوم و فناوري - Impact of data distribution, level of parallelism, and communication frequency on parallel data cube construction

DocumentCode :

1660965

Title :

Impact of data distribution, level of parallelism, and communication frequency on parallel data cube construction

Author :

Yang, Ge ; Jin, Ruoming ; Agrawal, Gagan

Author_Institution :

Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA

fYear :

2003

Abstract :

Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. We have developed a set of parallel algorithms for data cube construction using a new data structure called aggregation tree. Our experience has shown that a number of performance trade-offs arise in developing a parallel data cube implementation. We focus on three important issues, which are: (1) data distribution, i.e., how the original array is distributed among the processors; (2) level of parallelism, i.e., what parts of the computation are parallelized and sequentialized; and (3) frequency of communication, i.e., does the implementation require frequent interprocessor communication (and less memory) or less frequent communication (and more memory). We present a detailed experimental study evaluating the above trade-offs. We consider parallel data cube construction with different cube sizes and sparsity levels. Our experimental results show the following: (1) In all cases, reducing the frequency of communication and using higher memory gave better performance, though the difference was relatively small. (2) Choosing data distribution to minimize communication volume made a substantial difference in the performance in most of the cases. (3) Finally, using parallelism at all levels gave better performance, even though it increases the total communication volume.

Keywords :

data warehouses; parallel algorithms; software performance evaluation; tree data structures; aggregation tree; communication frequency; communication volume; data distribution; data structure; data warehouses; interprocessor communication; parallel algorithms; parallel data cube construction; parallel machines; parallelism level; performance trade-offs; Aggregates; Companies; Concurrent computing; Data analysis; Data warehouses; Distributed computing; Frequency; Parallel algorithms; Parallel processing; Performance analysis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium, 2003. Proceedings. International

ISSN :

1530-2075

Print_ISBN :

0-7695-1926-1

Type :

conf

DOI :

10.1109/IPDPS.2003.1213162

Filename :

1213162

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1660965