Title :
SIF: Overcoming the limitations of SIMD devices via implicit permutation
Author :
Huang, Libo ; Shen, Li ; Wang, Zhiying ; Shi, Wei ; Xiao, Nong ; Ma, Sheng
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
SIMD devices have gained widespread acceptance in modern microprocessor designs for their superior performance for multimedia applications. However, there are three remaining limitations to the efficient utilization of SIMD devices in general-purpose computer systems: memory alignment, data reorganization and control flow. This paper presents SIF, an efficient SIMD interface framework that addresses these three shortcomings without modifying existing ISA. It is designed around a permutation vector register file (PVRF) and it adds new extended instructions to set internal permutation state in SIMD datapath rather than putting the permutation state setting bits in every instruction. The implicit permutation capability provided by PVRF results in zero overhead, which frees the handling of three limitations by using permutation instructions. To further reduce the state setting instructions in SIMD datapath, a technique that moves the workloads from SIMD pipeline into scalar pipeline is also introduced. With the help of proposed compilation algorithm, SIF can efficiently transform regular SIMD codes into SIF codes which make it easily integrated in all existing SIMD devices. We implemented these techniques in a vectorizing compiler and experimental results show that most of the permutation overhead instructions can be eliminated and distinct performance speedup can be achieved, which is 37% higher than current SIMD techniques on average.
Keywords :
general purpose computers; logic design; microprocessor chips; parallel processing; pipeline processing; ISA; SIF; SIMD datapath; SIMD devices; SIMD interface framework; SIMD pipeline; control flow; data reorganization; general purpose computer systems; implicit permutation; memory alignment; microprocessor designs; multimedia applications; permutation vector register file; scalar pipeline; single instruction multiple data; state setting instructions; Application software; Control systems; Data flow computing; Hardware; Instruction sets; Microprocessors; Pipelines; Process design; Registers; Virtual colonoscopy;
Conference_Titel :
High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4244-5658-1
DOI :
10.1109/HPCA.2010.5416631