Title : 
Design of OpenCL-compatible multithreaded hardware accelerators with dynamic support for embedded FPGAs
         
        
            Author : 
Alfonso Rodr?guez;Juan Valverde;Eduardo de la Torre
         
        
            Author_Institution : 
Center of Industrial Electronics, Technical University of Madrid, Madrid, Spain
         
        
        
        
        
            Abstract : 
ARTICo3 is an architecture that permits to dynamically set an arbitrary number of reconfigurable hardware accelerators, each containing a given number of threads fixed at design time according to High Level Synthesis constraints. However, the replication of these modules can be decided at runtime to accelerate kernels by increasing the overall number of threads, add modular redundancy to increase fault tolerance, or any combination of the previous. An execution scheduler is used at kernel invocation to deliver the appropriate data transfers, optimizing memory transactions, and sequencing or parallelizing execution according to the configuration specified by the resource manager of the architecture. The model of computation is compatible with the OpenCL kernel execution model, and memory transfers and architecture are arranged to match the same optimization criteria as for kernel execution in GPU architectures but, differently to other approaches, with dynamic hardware execution support. In this paper, a novel design methodology for multithreaded hardware accelerators is presented. The proposed framework provides OpenCL compatibility by implementing a memory model based on shared memory between host and compute device, which removes the overhead imposed by data transferences at global memory level, and local memories inside each accelerator, i.e. compute unit, which are connected to global memory through optimized DMA links. These local memories provide unified access, i.e. a continuous memory map, from the host side, but are divided in a configurable number of independent banks (to increase available ports) from the processing elements side to fully exploit data-level parallelism. Experimental results show OpenCL model compliance using multithreaded hardware accelerators and enhanced dynamic adaptation capabilities.
         
        
            Keywords : 
"Hardware","Computer architecture","Random access memory","Kernel","Computational modeling","Adaptation models","Parallel processing"
         
        
        
            Conference_Titel : 
ReConFigurable Computing and FPGAs (ReConFig), 2015 International Conference on
         
        
        
            DOI : 
10.1109/ReConFig.2015.7393297