DocumentCode :
1295773
Title :
Efficient algorithms for block-cyclic array redistribution between processor sets
Author :
Park, Neungsoo ; Prasanna, Viktor K. ; Raghavendra, Cauligi S.
Author_Institution :
Dept. of Electr. Eng. Syst., Univ. of Southern California, Los Angeles, CA, USA
Volume :
10
Issue :
12
fYear :
1999
fDate :
12/1/1999 12:00:00 AM
Firstpage :
1217
Lastpage :
1240
Abstract :
Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is significantly smaller. It takes O(max(P, Q)) time and is less than 1 percent of the data transfer time. In comparison, the schedule computation time using the state-of-the-art scheme (which is based on the bipartite matching scheme) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have implemented the algorithm using C and MPI on an IBM SP2. Results show that our algorithm performs better than the previous algorithms with respect to the total redistribution time, which includes the time for data transfer, schedule, and index computation
Keywords :
distributed memory systems; parallel algorithms; processor scheduling; resource allocation; IBM SP2; bipartite matching scheme; block-cyclic array redistribution; communication schedule; data transfer; distributed memory supercomputers; index computation costs; network bandwidth; processor sets; run-time array redistribution; Computer Society; Concurrent computing; Costs; Distributed computing; Multidimensional signal processing; Processor scheduling; Runtime; Scheduling algorithm; Signal processing algorithms; Supercomputers;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/71.819945
Filename :
819945
Link To Document :
بازگشت