DocumentCode :
2997646
Title :
Fast indexing for blocked array layouts to improve multi-level cache locality
Author :
Athanasaki, Evangelia ; Koziris, Nectarios
Author_Institution :
National Tech. Univ. of Athens, Greece
fYear :
2004
fDate :
15 Feb. 2004
Firstpage :
107
Lastpage :
119
Abstract :
One of the key challenges computer architects and compiler writers are facing, is the increasing discrepancy between processor cycle times and main memory access times. To overcome this problem, program transformations that decrease cache misses are used, to reduce average latency for memory accesses. Tiling is a widely used loop iteration reordering technique for improving locality of references. In this paper, we further reduce cache misses, restructuring the memory layout of multi-dimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multi-dimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are now easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results on three different hardware platforms, using 5 benchmarks, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Both TLB and L1 cache misses are concurrently minimized, for the same tile size, thus, applying the proposed layouts, locality of references is greatly improved. Finally, simulations using the Simplescalar tool, verify that our enhanced performance is due to the considerable reduction of cache misses in all levels of memory hierarchy.
Keywords :
cache storage; computer architecture; instruction sets; program compilers; program control structures; storage allocation; Simplescalar tool; array element; average latency reduction; binary-mask operations; blocked array layouts; blocked memory layout; cache misses reduction; compiler writers; computer architecture; dilated integers algebra; execution time; index translation functions; loop iteration reordering; memory access times; memory hierarchy; morton-order indexing; multidimensional array indexing; multidimensional arrays; multilevel cache locality; processor cycle times; program transformations; reference locality; simulations; tiled array layout; tiled instruction code; tiled instruction stream; Algebra; Delay; Hardware; Indexing; Laboratories; Law; Pipeline processing; Registers; Systems engineering and theory; Tiles;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Eighth Workshop on
Print_ISBN :
0-7695-2061-8
Type :
conf
DOI :
10.1109/INTERA.2004.1299515
Filename :
1299515
Link To Document :
بازگشت