DocumentCode
2958106
Title
Efficient Resource Oblivious Algorithms for Multicores with False Sharing
Author
Cole, Richard ; Ramachandran, Vijaya
Author_Institution
Comput. Sci. Dept., NYU, New York, NY, USA
fYear
2012
fDate
21-25 May 2012
Firstpage
201
Lastpage
214
Abstract
We consider algorithms for a multicore environment in which each core has its own private cache and false sharing can occur. False sharing happens when two or more processors access the same block (i.e., cache-line) in parallel, and at least one processor writes into a location in the block. False sharing causes different processors to have inconsistent views of the data in the block, and many of the methods currently used to resolve these inconsistencies can cause large delays. We analyze the cost of false sharing both for variables stored on the execution stacks of the parallel tasks and for output variables. Our main technical contribution is to establish a low cost for this overhead for the class of multithreaded block-resilient HBP (Hierarchical Balanced Parallel) computations. Using this and other techniques, we develop block-resilient HBP algorithms with low false sharing costs for several fundamental problems including scans, matrix multiplication, FFT, sorting, and hybrid block-resilient HBP algorithms for list ranking and graph connected components. Most of these algorithms are derived from known multicore algorithms, but are further refined to achieve a low false sharing overhead. Our algorithms make no mention of machine parameters, and our analysis of the false sharing overhead is mostly in terms of the the number of tasks generated in parallel during the computation, and thus applies to a variety of schedulers.
Keywords
cache storage; graph theory; matrix multiplication; multi-threading; multiprocessing systems; parallel algorithms; parallel architectures; resource allocation; FFT; block location; execution stacks; false sharing; graph connected components; hybrid block-resilient HBP algorithms; inconsistent data views; list ranking; machine parameters; matrix multiplication; multicore algorithms; multicore environment; multithreaded block-resilient HBP computations; multithreaded block-resilient hierarchical balanced parallel computations; parallel block; parallel tasks generation; private cache; resource oblivious algorithms; Algorithm design and analysis; Computational modeling; Delay; Instruction sets; Multicore processing; Upper bound; cache-efficiency; false-sharing; multicores;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location
Shanghai
ISSN
1530-2075
Print_ISBN
978-1-4673-0975-2
Type
conf
DOI
10.1109/IPDPS.2012.28
Filename
6267836
Link To Document