DocumentCode
1772648
Title
Exploring DMA-assisted prefetching strategies for software caches on multicore clusters
Author
Pinto, Claudio ; Benini, Luca
Author_Institution
DEI Dept., Univ. of Bologna, Bologna, Italy
fYear
2014
fDate
18-20 June 2014
Firstpage
224
Lastpage
231
Abstract
Modern many-core programmable accelerators are often composed by several computing units grouped in clusters, with a shared per-cluster scratchpad data memory. The main programming challenge imposed by these architectures is to hide the external memory to on-chip scratchpad memory transfer latency, trying to overlap as much as possible memory transfers with actual computation. This problem is usually tackled using complex DMA-based programming patterns (e.g. double buffering), which require a heavy refactoring of applications. Software caches are an alternative to hand-optimized DMA programming. However, even if a software cache can reduce the programming effort, it is still relying on synchronous memory transfers. In fact in case of a cache miss, the new line is copied in cache and the requesting processor has to wait for the completion of the transfer. While waiting, processors are not able to perform any other computation. Cache lines prefetching can be used to reduce the number of synchronous memory transfers, and increase the active time of each processor, by loading cache lines before they are actually needed. In this work we explore various DMA-based prefetching techniques applied to a software cache implementation, presenting both automatic and programmer assisted prefetch mechanisms applied to computer vision kernels.
Keywords
cache storage; computer vision; microprocessor chips; multiprocessing systems; storage management; DMA-assisted prefetching strategies; complex DMA-based programming patterns; computer vision kernels; computing units; hand-optimized DMA programming; many-core programmable accelerators; multicore clusters; on-chip scratchpad memory transfer latency; shared per-cluster scratchpad data memory; software caches; synchronous memory transfers; Clocks; Hardware; Indexes; Kernel; Prefetching; Programming;
fLanguage
English
Publisher
ieee
Conference_Titel
Application-specific Systems, Architectures and Processors (ASAP), 2014 IEEE 25th International Conference on
Conference_Location
Zurich
Type
conf
DOI
10.1109/ASAP.2014.6868666
Filename
6868666
Link To Document