DocumentCode
3327033
Title
Concurrent predicates: a debugging technique for every parallel programmer
Author
Wenhao Jia ; Shaw, Kelly A. ; Martonosi, Margaret
Author_Institution
Princeton Univ., Princeton, NJ, USA
fYear
2013
fDate
7-11 Sept. 2013
Firstpage
331
Lastpage
340
Abstract
Graphics processing units (GPUs) are in increasingly wide use, but significant hurdles lie in selecting the appropriate algorithms, runtime parameter settings, and hardware configurations to achieve power and performance goals with them. Exploring hardware and software choices requires time-consuming simulations or extensive real-system measurements. While some auto-tuning support has been proposed, it is often narrow in scope and heuristic in operation. This paper proposes and evaluates a statistical analysis technique, Starchart, that partitions the GPU hardware/software tuning space by automatically discerning important inflection points in design parameter values. Unlike prior methods, Starchart can identify the best parameter choices within different regions of the space. Our tool is efficient - evaluating at most 0.3% of the tuning space, and often much less - and is robust enough to analyze highly variable real-system measurements, not just simulation. In one case study, we use it to automatically find platform-specific parameter settings that are 6.3× faster (for AMD) and 1.3× faster (for NVIDIA) than a single general setting. We also show how power-optimized parameter settings can save 47W (26% of total GPU power) with little performance loss. Overall, Starchart can serve as a foundation for a range of GPU compiler optimizations, auto-tuners, and programmer tools. Furthermore, because Starchart does not rely on specific GPU features, we expect it to be useful for broader CPU/GPU studies as well.
Keywords
graphics processing units; hardware-software codesign; program compilers; recursive estimation; regression analysis; trees (mathematics); GPU compiler optimizations; GPU hardware/software tuning space; NVIDIA; Starchart; auto-tuners; auto-tuning support; design parameter values; graphics processing units; hardware configurations; hardware optimization; platform-specific parameter settings; power-optimized parameter settings; programmer tools; real-system measurements; recursive partitioning regression trees; runtime parameter settings; software optimization; statistical analysis technique; time-consuming simulations; Graphics processing units; Hardware; Kernel; Optimization; Power measurement; Tuning; concurrent predicate expressions; concurrent predicates;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on
Conference_Location
Edinburgh
ISSN
1089-795X
Print_ISBN
978-1-4799-1018-2
Type
conf
DOI
10.1109/PACT.2013.6618822
Filename
6618822
Link To Document