Title :
Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture
Author :
Giorgi, Roberto ; Popovic, Zdravko ; Puzovic, Nikola
Author_Institution :
Dept. of Inf. Eng., Univ. of Siena, Siena, Italy
Abstract :
DTA (decoupled threaded architecture) is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a distributed hardware scheduling unit and relying on existing simple cores (in-order pipelines, no branch predictors, no ROBs). In DTA, the local variables and synchronization data are communicated via a fast frame memory. If the compiler cannot remove global data accesses, the threads are excessively fragmented. Therefore, in this paper, we present an implementation of a pre-fetching mechanism (for global data) that complements the original DTA pre-load mechanism (for consumer-producer data patterns) with the aim of improving non-blocking execution of the threads. Our implementation is based on an enhanced DMA mechanism to prefetch global data. We estimated the benefit and identified the required support of this proposed approach, in an initial implementation. In case of longer latency to access memory, our idea can reduce execution time greatly (i.e., 11times for the zoom benchmark on 8 processors) compared to the case of no-prefetching.
Keywords :
multi-threading; scheduling; storage management; synchronisation; DMA; data prefetching mechanism; decoupled threaded architecture; distributed hardware scheduling unit; fast frame memory; fine/medium grained thread level parallelism; nonblocking thread execution; synchronization data; Delay; Design engineering; Earth; Hardware; Job shop scheduling; Pipelines; Prefetching; Processor scheduling; Protocols; Yarn;
Conference_Titel :
Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on
Conference_Location :
Rome
Print_ISBN :
978-1-4244-3751-1
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2009.5161111