DocumentCode :
1494559
Title :
Synthesizing efficient out-of-core programs for block recursive algorithms using block-cyclic data distributions
Author :
Li, Zhiyong ; Reif, John H. ; Gupta, Sandeep K S
Author_Institution :
Network Comput. Software Div., IBM Corp., Research Triangle Park, NC, USA
Volume :
10
Issue :
3
fYear :
1999
fDate :
3/1/1999 12:00:00 AM
Firstpage :
297
Lastpage :
315
Abstract :
In this paper, we present a framework for synthesizing I/O efficient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other matrix operations. The programs are optimized for the striped Vitter and Shriver´s two-level memory model in which data can be distributed using various cyclic(B) distributions in contrast to the normally used physical track distribution cyclic(Bd ), where Bd is the physical disk block size. We first introduce tensor bases to capture the semantics of block-cyclic data distributions of out-of-core data and also data access patterns to out-of-core data. We then present program generation techniques for tensor products and matrix transposition. We accurately represent the number of parallel I/O operations required for the synthesized programs for tensor products and matrix transposition as a function of tensor bases and data distributions. We introduce an algorithm to determine the data distribution which optimizes the performance of the synthesized programs. Further, we formalize the procedure of synthesizing efficient out-of-core programs for tensor product formulas with various block-cyclic distributions as a dynamic programming problem. We demonstrate the effectiveness of our approach through several examples. We show that the choice of an appropriate data distribution can reduce the number of passes to access out-of-core data by as large as eight times for a tensor product and the dynamic programming approach can largely reduce the number of passes to access out-of-core data for the overall tensor product formulas
Keywords :
application generators; automatic programming; dynamic programming; fast Fourier transforms; algebraic representation; block matrix transposition algorithms; block recursive algorithms; block-cyclic data distributions; dynamic programming; efficient out-of-core programs synthesis; fast Fourier transform; matrix transposition; physical disk block size; physical track distribution cyclic; program generation techniques; tensor products; two-level memory model; Application software; Communication networks; Computer applications; Concurrent computing; Costs; Dynamic programming; Fast Fourier transforms; Network synthesis; Programmable logic arrays; Tensile stress;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/71.755830
Filename :
755830
Link To Document :
بازگشت