Title :
Exploring DMA-assisted prefetching strategies for software caches on multicore clusters
Author :
Pinto, Claudio ; Benini, Luca
Author_Institution :
DEI Dept., Univ. of Bologna, Bologna, Italy
Abstract :
Modern many-core programmable accelerators are often composed by several computing units grouped in clusters, with a shared per-cluster scratchpad data memory. The main programming challenge imposed by these architectures is to hide the external memory to on-chip scratchpad memory transfer latency, trying to overlap as much as possible memory transfers with actual computation. This problem is usually tackled using complex DMA-based programming patterns (e.g. double buffering), which require a heavy refactoring of applications. Software caches are an alternative to hand-optimized DMA programming. However, even if a software cache can reduce the programming effort, it is still relying on synchronous memory transfers. In fact in case of a cache miss, the new line is copied in cache and the requesting processor has to wait for the completion of the transfer. While waiting, processors are not able to perform any other computation. Cache lines prefetching can be used to reduce the number of synchronous memory transfers, and increase the active time of each processor, by loading cache lines before they are actually needed. In this work we explore various DMA-based prefetching techniques applied to a software cache implementation, presenting both automatic and programmer assisted prefetch mechanisms applied to computer vision kernels.
Keywords :
cache storage; computer vision; microprocessor chips; multiprocessing systems; storage management; DMA-assisted prefetching strategies; complex DMA-based programming patterns; computer vision kernels; computing units; hand-optimized DMA programming; many-core programmable accelerators; multicore clusters; on-chip scratchpad memory transfer latency; shared per-cluster scratchpad data memory; software caches; synchronous memory transfers; Clocks; Hardware; Indexes; Kernel; Prefetching; Programming;
Conference_Titel :
Application-specific Systems, Architectures and Processors (ASAP), 2014 IEEE 25th International Conference on
Conference_Location :
Zurich
DOI :
10.1109/ASAP.2014.6868666