Title :
The impact of parallel loop scheduling strategies on prefetching in a shared memory multiprocessor
Author_Institution :
Dept. of Electr. Eng., Minnesota Univ., Minneapolis, MN, USA
fDate :
6/1/1994 12:00:00 AM
Abstract :
Trace-driven simulations of numerical Fortran programs are used to study the impact of the parallel loop scheduling strategy on data prefetching in a shared memory multiprocessor with private data caches. The simulations indicate that to maximize memory performance, it is important to schedule blocks of consecutive iterations to execute on each processor, and then to adaptively prefetch single-word cache blocks to match the number of iterations scheduled. Prefetching multiple single-word cache blocks on a miss reduces the miss ratio by approximately 5% to 30% compared to a system with no prefetching. In addition, the proposed adaptive prefetching scheme further reduces the miss ratio while significantly reducing the false sharing among cache blocks compared to nonadaptive prefetching strategies. Reducing the false sharing causes fewer coherence invalidations to be generated, and thereby reduces the total network traffic. The impact of the prefetching and scheduling strategies on the temporal distribution of coherence invalidations also is examined. It is found that invalidations tend to be evenly distributed throughout the execution of parallel loops, but tend to be clustered when executing sequential program sections. The distribution of invalidations in both types of program sections is relatively insensitive to the prefetching and scheduling strategy
Keywords :
buffer storage; parallel programming; performance evaluation; scheduling; shared memory systems; cache coherence; cache pollution; data caches; false sharing; guided self-scheduling; memory performance; numerical Fortran programs; parallel loop scheduling; prefetching; shared memory multiprocessor; single-word cache blocks; trace-driven simulations; Application software; Dynamic scheduling; Hardware; Large-scale systems; Multiprocessor interconnection networks; Numerical simulation; Pollution; Prefetching; Processor scheduling; Software performance;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on