DocumentCode
3307950
Title
Optimizing SIMD Parallel Computation with Non-Consecutive Array Access in Inline SSE Assembly Language
Author
Juan, Chen ; Canqun, Yang
Author_Institution
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
fYear
2012
fDate
12-14 Jan. 2012
Firstpage
254
Lastpage
257
Abstract
Many processors, such as Intel Xeon processor 5100 series, AMD Athlon 64, support SIMD computation model with the Streaming SIMD Extensions (SSE), SSE2 and SSE3. Using double-precision SSE/SSE2/SSE3 instructions simultaneously can handle two packed double-precision floating-point data elements with 128-bit XMM vector registers, which greatly improves floating-point performance. Sometimes non-consecutive data instead of consecutive ones appear in SIMD computation, which prevents SIMD optimization. That is because two non-consecutive double precision floating-point data elements cannot be loaded into 128-bit vector registers simultaneously and they have to be loaded for twice. How to implement SIMD optimization for non-consecutive data is our concern. Loop unrolling exposes the rule and characteristics of such non-consecutive data. Register rotation can help transform non-consecutive data to vector data. Based on a representative kernel program, we illustrate our SIMD optimization combining loop unrolling with register rotation. Through vectorizing non-consecutive data, the performance of "KERNEL" code is improved by 42.4% and PQMRCGSTAB application is improved by 15.3%.
Keywords
floating point arithmetic; microprocessor chips; parallel processing; program compilers; AMD Athlon 64; Intel Xeon processor 5100 series; SIMD computation model; SIMD optimization; SIMD parallel computation optimisation; SSE; XMM vector registers; floating-point data; inline SSE assembly language; nonconsecutive array access; streaming SIMD extensions; Arrays; Assembly; Kernel; Optimization; Program processors; Registers; Vectors; SIMD; SSE/SSE2/SSE3; inline assembly; loop unrolling; nonconsecutive data; register rotation;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Computation Technology and Automation (ICICTA), 2012 Fifth International Conference on
Conference_Location
Zhangjiajie, Hunan
Print_ISBN
978-1-4673-0470-2
Type
conf
DOI
10.1109/ICICTA.2012.70
Filename
6150189
Link To Document