DocumentCode :
2535826
Title :
Efficient Work Stealing for Fine Grained Parallelism
Author :
Faxén, Karl-Filip
Author_Institution :
Swedish Inst. of Comput. Sci., Stockholm, Sweden
fYear :
2010
fDate :
13-16 Sept. 2010
Firstpage :
313
Lastpage :
322
Abstract :
This paper deals with improving the performance of fine grain task parallelism. It is often either cumbersome or impossible to increase the grain size of such programs. Increasing core counts exacerbates the problem; a program that appears coarse-grained on eight cores may well look a lot more fine-grained on sixty four. In this paper we present the direct task stack, a novel work stealing algorithm with unusually low overheads, both for creating tasks and for stealing. We compare the performance of our scheduler to Cilk++, the icc implementation of OpenMP 3.0 and the Intel TBB library on an eight core, dual socket Opteron machine. We also analyze the reasons why our techniques achieve consistent speed ups over the other systems ranging from 2-3x on many fine grained workloads to over 50 in extreme cases and show quantitatively how each of the techniques we use contribute to the improved performance.
Keywords :
microprocessor chips; multiprocessing systems; parallel programming; task analysis; Cilk++ scheduler; Intel TBB library; OpenMP 3.0; direct task stack; dual socket Opteron machine; fine grain task parallelism; work stealing algorithm; Load management; Parallel processing; Program processors; Sparse matrices; Stress; Synchronization; Wool; multicore; task parallelism; work stealing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2010 39th International Conference on
Conference_Location :
San Diego, CA
ISSN :
0190-3918
Print_ISBN :
978-1-4244-7913-9
Electronic_ISBN :
0190-3918
Type :
conf
DOI :
10.1109/ICPP.2010.39
Filename :
5599176
Link To Document :
بازگشت