Title :
Block Processor: A resource-distributed architecture
Author :
Zeke Wang ; Feng Yu ; Xue Liu
Author_Institution :
Inst. of Digital Technol. & Instrum., Zhejiang Univ., Hangzhou, China
Abstract :
We present the architecture of Block Processor, task-level coprocessor, to execute vectorizable computing task migrated from main processor via command bus. The Block Processor is designed around 32 high-MVL block registers, which can be direct operands of vector instruction and be local cache of the Block Processor. The corresponding unique conflict-solving mechanism scales with the various implementations and easily supports chaining by adding extra execution states. The architecture distributes the block registers, ALUs and control logic. We implement the Block Processor which maps efficiently into the FPGA since the FPGA also distributes its inner resource. Each block register requires two FPGA Block RAM to be 2-read-1-write-port, 1024-depth and 32-bit-width. With the enhanced chaining and decoupling, it might hinder the latency of vector memory instructions and then sustain the computing abilities. With the little resource occupied, 1024-point radix-2 DIF FFT costs 11348 cycles on one Block Processor.
Keywords :
DRAM chips; SRAM chips; cache storage; coprocessors; field programmable gate arrays; 1024-point radix-2 DIF FFT costs; 2-read-1-write-port; ALUs; FPGA block RAM; SDRAM; block processor; command bus; conflict-solving mechanism; control logic; execution states; high-MVL block registers; local cache; resource-distributed architecture; task-level coprocessor; vector memory instructions; word length 32 bit; Computer architecture; Field programmable gate arrays; Hazards; Ports (Computers); Random access memory; Registers; Vectors; block register; chaining; co-processor; vector;
Conference_Titel :
High Performance Extreme Computing Conference (HPEC), 2013 IEEE
Conference_Location :
Waltham, MA
Print_ISBN :
978-1-4799-1364-0
DOI :
10.1109/HPEC.2013.6670321