Author_Institution :
Comput. Sci. & Inf. Dept., Taibah Univ., Madinah, Saudi Arabia
Abstract :
This paper proposes new processor architecture for accelerating data-parallel applications based on the combination of VLIW and vector processing paradigms. It uses VLIW architecture for processing multiple independent scalar instructions concurrently on parallel execution units. Data parallelism is expressed by vector ISA and processed on the same parallel execution units of the VLIW architecture. The proposed processor, which is called VecLIW, has unified register file of 64×32-bit registers in the decode stage for storing scalar/vector data. VecLIW can issue up to four scalar/vector operations in each cycle for parallel processing a set of operands and producing up to four results. However, it cannot issue more than one memory operation at a time, which loads/stores 128-bit scalar/vector data from/to data cache. Four 32-bit results can be written back into VecLIW register file. The complete design of our proposed VecLIW processor is implemented using VHDL targeting the Xilinx FPGA Virtex-5, XC5VLX110T-3FF1136 device. The required numbers of slice registers and LUTs are 3,992 and 14,826 (14,570 for logic and 256 for memory), respectively. The number of LUT-FF pairs used is 17,425, where 13,433 for unused flip-flops, 2,599 for unused LUT, and 1,393 for fully used LUT-FF pairs.
Keywords :
cache storage; field programmable gate arrays; flip-flops; hardware description languages; instruction sets; microprocessor chips; multiprocessing systems; parallel architectures; LUT-FF pairs; VHDL; VLIW architecture; VecLIW processor; VecLIW register file; XC5VLX110T-3FF1136 device; Xilinx FPGA Virtex-5; data cache; data parallelism; data-parallel applications; multiple independent scalar instructions; multiscalar-vector instructions; parallel execution units; parallel processing; slice registers; storage capacity 128 bit; unified datapath; unused flip-flops; vector ISA; vector processing; Computer architecture; Hardware; Parallel processing; Pipelines; Registers; VLIW; Vectors; FPGA/VHDL implementation; VLIW architecture; data-level parallelism; unified datapath; vector processing;