Optimizing SIMD Parallel Computation with Non-Consecutive Array Access in Inline SSE Assembly Language

Author

Juan, Chen ; Canqun, Yang

Author_Institution

Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China

fYear

2012

fDate

12-14 Jan. 2012

Firstpage

254

Lastpage

257

Abstract

Many processors, such as Intel Xeon processor 5100 series, AMD Athlon 64, support SIMD computation model with the Streaming SIMD Extensions (SSE), SSE2 and SSE3. Using double-precision SSE/SSE2/SSE3 instructions simultaneously can handle two packed double-precision floating-point data elements with 128-bit XMM vector registers, which greatly improves floating-point performance. Sometimes non-consecutive data instead of consecutive ones appear in SIMD computation, which prevents SIMD optimization. That is because two non-consecutive double precision floating-point data elements cannot be loaded into 128-bit vector registers simultaneously and they have to be loaded for twice. How to implement SIMD optimization for non-consecutive data is our concern. Loop unrolling exposes the rule and characteristics of such non-consecutive data. Register rotation can help transform non-consecutive data to vector data. Based on a representative kernel program, we illustrate our SIMD optimization combining loop unrolling with register rotation. Through vectorizing non-consecutive data, the performance of "KERNEL" code is improved by 42.4% and PQMRCGSTAB application is improved by 15.3%.

Keywords

floating point arithmetic; microprocessor chips; parallel processing; program compilers; AMD Athlon 64; Intel Xeon processor 5100 series; SIMD computation model; SIMD optimization; SIMD parallel computation optimisation; SSE; XMM vector registers; floating-point data; inline SSE assembly language; nonconsecutive array access; streaming SIMD extensions; Arrays; Assembly; Kernel; Optimization; Program processors; Registers; Vectors; SIMD; SSE/SSE2/SSE3; inline assembly; loop unrolling; nonconsecutive data; register rotation;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Computation Technology and Automation (ICICTA), 2012 Fifth International Conference on

Conference_Location

Zhangjiajie, Hunan

Print_ISBN

978-1-4673-0470-2

Type

conf

DOI

10.1109/ICICTA.2012.70

Filename

6150189