DocumentCode
774150
Title
Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers
Author
Hsu, Ching-Hsien ; Chen, Ming-Hao ; Yang, Chao-Tung ; Li, Kuan-Ching
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Chung Hua Univ., Hsinchn
Volume
17
Issue
11
fYear
2006
Firstpage
1226
Lastpage
1241
Abstract
Dynamic data redistribution is used to enhance data locality and algorithm performance by reducing interprocessor communication in many parallel scientific applications on distributed memory multicomputers. Since the redistribution is performed at runtime, there is a performance tradeoff between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a processor replacement scheme to minimize the cost of interprocessor data exchange during runtime. The main idea of the proposed technique is to develop a replacement function for reordering logical processors in the destination phase. Based on the replacement function, a realigned sequence of destination processors can be derived and is then used to perform data decomposition in the receiving phase. Together with local matrix and compressed CRS vectors transposition schemes, the interprocessor communication can be eliminated during runtime. A significant improvement of this approach is that the realignment of data can be performed without interprocessor communication for special cases. The second contribution of the present technique is that the complicated communication sets generation could be simplified by applying local matrix transposition. Consequently, the indexing cost could be reduced significantly. The proposed techniques can be applied in both dense and sparse applications. A generalized symmetric redistribution algorithm is also presented in this work. To analyze the efficiency of the proposed technique, the theoretical analysis proves that up to (p-1)/p data transmission cost can be saved. For general cases, the symmetric redistribution algorithm saves 1/p communication overheads compared with the traditional method. Experimental results also show that the proposed techniques provide superior performance in most data redistribution instances
Keywords
distributed algorithms; electronic data interchange; matrix algebra; parallelising compilers; vectors; CRS vector transposition scheme; data decomposition; data locality enhancement; dynamic data redistribution; interprocessor communication; interprocessor data exchange; local matrix transposition; parallelizing compilers; processor replacement scheme; symmetric redistribution algorithm; symmetrical matrices; Chaotic communication; Computer Society; Computer science; Costs; Optimizing compilers; Parallel programming; Programming profession; Runtime; Sparse matrices; Symmetric matrices; CRS transposition; Processor replacement; communication free; data redistribution; sparse matrix.; symmetric matrix;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/TPDS.2006.162
Filename
1705461
Link To Document