DocumentCode
2190388
Title
Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures
Author
Wu, Xiao-Long ; Obeid, Nady ; Hwu, Wen-Mei
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
fYear
2010
fDate
June 29 2010-July 1 2010
Firstpage
1175
Lastpage
1180
Abstract
Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In this paper, we propose a method for exploiting more parallelism by isolating the reduction from users of the intermediate results. The other main contribution of our work is enabling the parallelization of more complex reduction codes, including those that involve the use of intermediate reduction results. The proposed transformations are often implemented by programmers in an ad-hoc manner, but to the best of our knowledge no previous work has been proposed to automate these transformations for many-core architectures. We show that the automatic transformations can result in significant speedup compared to the original code using two benchmark applications.
Keywords
coprocessors; parallel architectures; GPU architecture; automatic transformation; complex reduction codes; data communication; data locality; many-core architecture; reduction idioms; reduction operation; Computers; Conferences; Information technology; Automatic Transformation; Compiler Techniques; GPUs; Graphics Processors; Many-Core; Reduction;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on
Conference_Location
Bradford
Print_ISBN
978-1-4244-7547-6
Type
conf
DOI
10.1109/CIT.2010.213
Filename
5577899
Link To Document