Title :
An efficient code generation technique for tiled iteration spaces
Author :
Goumas, G. ; Athanasaki, M. ; Koziris, N.
Author_Institution :
Comput. Syst. Lab., Nat. Tech. Univ. of Athens, Greece
Abstract :
This paper presents a novel approach for the problem of generating tiled code for nested for-loops, transformed by a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in multilevel memory hierarchies, as well as to efficiently execute loops onto parallel architectures. However, automatic code generation for tiled loops can be a very complex compiler work, especially when nonrectangular tile shapes and iteration space bounds are concerned. Our method considerably enhances previous work on rewriting tiled loops, by considering parallelepiped tiles and arbitrary iteration space shapes. In order to generate tiled code, we first enumerate all tiles containing points within the iteration space and, second, sweep all points within each tile. For the first subproblem, we refine upon previous results concerning the computation of new loop bounds of an iteration space that has been transformed by a nonunimodular transformation. For the second subproblem, we transform the initial parallelepiped tile into a rectangular one, in order to generate efficient code with the aid of a nonunimodular transformation matrix and its Hermite Normal Form (HNF). Experimental results show that the proposed method significantly accelerates the compilation process and generates much more efficient code.
Keywords :
parallel architectures; parallelising compilers; Fourier-Motzkin elimination; arbitrary iteration space shapes; coarse grain parallelism; code generation; data locality; distributed memory machines; iteration space bounds; nested for-loops; nested loops; nonunimodular transformations; parallel architectures; partitioning; tiled loops; tiling transformation; Computer Society; Concurrent computing; Delay; Parallel architectures; Parallel processing; Processor scheduling; Scheduling algorithm;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2003.1239870