مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploring DMA-assisted prefetching strategies for software caches on multicore clusters

DocumentCode :

1772648

Title :

Exploring DMA-assisted prefetching strategies for software caches on multicore clusters

Author :

Pinto, Claudio ; Benini, Luca

Author_Institution :

DEI Dept., Univ. of Bologna, Bologna, Italy

fYear :

2014

fDate :

18-20 June 2014

Firstpage :

224

Lastpage :

231

Abstract :

Modern many-core programmable accelerators are often composed by several computing units grouped in clusters, with a shared per-cluster scratchpad data memory. The main programming challenge imposed by these architectures is to hide the external memory to on-chip scratchpad memory transfer latency, trying to overlap as much as possible memory transfers with actual computation. This problem is usually tackled using complex DMA-based programming patterns (e.g. double buffering), which require a heavy refactoring of applications. Software caches are an alternative to hand-optimized DMA programming. However, even if a software cache can reduce the programming effort, it is still relying on synchronous memory transfers. In fact in case of a cache miss, the new line is copied in cache and the requesting processor has to wait for the completion of the transfer. While waiting, processors are not able to perform any other computation. Cache lines prefetching can be used to reduce the number of synchronous memory transfers, and increase the active time of each processor, by loading cache lines before they are actually needed. In this work we explore various DMA-based prefetching techniques applied to a software cache implementation, presenting both automatic and programmer assisted prefetch mechanisms applied to computer vision kernels.

Keywords :

cache storage; computer vision; microprocessor chips; multiprocessing systems; storage management; DMA-assisted prefetching strategies; complex DMA-based programming patterns; computer vision kernels; computing units; hand-optimized DMA programming; many-core programmable accelerators; multicore clusters; on-chip scratchpad memory transfer latency; shared per-cluster scratchpad data memory; software caches; synchronous memory transfers; Clocks; Hardware; Indexes; Kernel; Prefetching; Programming;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Application-specific Systems, Architectures and Processors (ASAP), 2014 IEEE 25th International Conference on

Conference_Location :

Zurich

Type :

conf

DOI :

10.1109/ASAP.2014.6868666

Filename :

6868666

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1772648