Title :
StVEC: A Vector Instruction Extension for High Performance Stencil Computation
Author :
Sedaghati, Naser ; Thomas, Renji ; Pouchet, Louis-Noël ; Teodorescu, Radu ; Sadayappan, P.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
Stencil computations comprise the compute-intensive core of many scientific applications. The data access pattern of stencil computations often requires several adjacent data elements of arrays to be accessed in innermost parallel loops. Although such loops are vectorized by current compilers like GCC and ICC that target short-vector SIMD instruction sets, a number of redundant loads or additional intra-register data shuffle operations are required, reducing the achievable performance. Thus, even when all arrays are cache resident, the peak performance achieved with stencil computations is considerably lower than machine peak. In this paper, we present a hardware-based solution for this problem. We propose an extension to the standard addressing mode of vector floating-point instructions in ISAs such as SSE, AVX, VMX etc. We propose an extended mode of paired-register addressing and its hardware implementation, to overcome the performance limitation of current short-vector SIMD ISA´s for stencil computations. Further, we present a code generation approach that can be used by a vectorizing compiler for processors with such an instructions set. Using an optimistic as well as a pessimistic emulation of the proposed instruction extension, we demonstrate the effectiveness of the proposed approach on top of SSE and AVX capable processors. We also synthesize parts of the proposed design using a 45nm CMOS library and show minimal impact on processor cycle time.
Keywords :
instruction sets; natural sciences computing; parallel processing; program compilers; vector processor systems; AVX; CMOS library; GCC; ICC; ISA; SSE; StVEC; VMX; code generation approach; compilers; compute intensive core; high performance stencil computation; intraregister data shuffle operations; paired register addressing; parallel loops; processor cycle time; scientific applications; short vector SIMD instruction sets; vector floating point instructions; vector instruction extension; Arrays; Decoding; Hardware; Program processors; Registers; Vectors; High Performance; Stencil Computation; Vector ISA;
Conference_Titel :
Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
Conference_Location :
Galveston, TX
Print_ISBN :
978-1-4577-1794-9
DOI :
10.1109/PACT.2011.59