Title :
Optimal loop scheduling for hiding memory latency based on two-level partitioning and prefetching
Author :
Wang, Zhong ; Neil, Timothy W O ; Sha, Edwin H-M
Author_Institution :
Dept. of Comput. Sci. & Eng., Notre Dame Univ., IN, USA
fDate :
11/1/2001 12:00:00 AM
Abstract :
The large latency of memory accesses in modern computers is a key obstacle in achieving high processor utilization. As a result, a variety of techniques have been devised to hide this latency. These techniques range from cache hierarchies to various prefetching and memory management techniques for manipulating the data present in the caches. In DSP applications, the existence of large numbers of uniform nested loops makes the issue of loop scheduling very important. In this paper, we propose a new memory management technique that can be applied to computer architectures with three levels of memory, which is the scheme generally adopted in contemporary computer architectures. This technique takes advantage of access pattern information that is available at compile time by prefetching certain data elements from the higher level memory before they are explicitly requested by the lower level memory or CPU. It also maintains certain data for a period of time to prevent unnecessary data swapping. In order to take better advantage of the locality of references present in these loop structures, our technique introduces a new approach to memory management by partitioning it and reducing execution to each partition so that data locality is much improved compared with the usual pattern. These combined approaches-using a new set of memory instructions as well as partitioning the memory-lead to improvements in average execution times of approximately 35% over the one-level partition algorithm and more than 80% over list scheduling and hardware prefetching
Keywords :
memory architecture; optimisation; processor scheduling; signal processing; storage management; DSP applications; access pattern information; average execution times; computer architectures; data elements; data locality; higher level memory; memory accesses; memory instructions; memory latency; memory management; optimal loop scheduling; prefetching; processor utilization; two-level partitioning; uniform nested loops; Application software; Central Processing Unit; Computer architecture; Delay; Digital signal processing; Memory management; Partitioning algorithms; Prefetching; Processor scheduling; Scheduling algorithm;
Journal_Title :
Signal Processing, IEEE Transactions on