Hidp: A hierarchical data parallel language

Author

Yongpeng Zhang ; Mueller, Frank

Author_Institution

North Carolina State Univ., Raleigh, NC, USA

fYear

2013

fDate

23-27 Feb. 2013

Firstpage

1

Lastpage

11

Abstract

Problem domains are commonly decomposed hierarchically to fully utilize parallel resources in modern microprocessors. Such decompositions can be provided as library routines, written by experienced experts, for general algorithmic patterns. But such APIs tend to be constrained to certain architectures or data sizes. Integrating them with application code is often an unnecessarily daunting task, especially when these routines need to be closely coupled with user code to achieve better performance. This paper contributes HiDP, a high-level hierarchical data parallel language. The purpose of HiDP is to improve the coding productivity of integrating hierarchical data parallelism without significant loss of performance. HiDP is a source-to-source compiler that converts a very concise data parallel language into CUDA C++ source code. Internally, it performs necessary analysis to compose user code with efficient and architecture-aware code snippets. This paper discusses various aspects of HiDP systematically: the language, the compiler and the run-time system with built-in tuning capabilities. They enable HiDP users to express algorithms in less code than low-level SDKs require for native platforms. HiDP also exposes abundant computing resources of modern parallel architectures. Improved coding productivity tends to come with a sacrifice in performance. Yet, experimental results show that the generated code delivers performance very close to handcrafted native GPU code.

Keywords

C++ language; application program interfaces; graphics processing units; microprocessor chips; parallel architectures; parallel languages; program compilers; source coding; APIs; CUDA C++ source code; HiDP; algorithmic patterns; architecture-aware code snippets; built-in tuning capabilities; coding productivity; computing resources; data parallel language; data sizes; handcrafted native GPU code; hierarchical data parallelism; high-level hierarchical data parallel language; library routines; low-level SDKs; microprocessors; parallel architectures; parallel resources; run-time system; source-to-source compiler; user code; Arrays; Graphics processing units; Kernel; Libraries; Parallel processing; Shape; Synchronization;

fLanguage

English

Publisher

ieee

Conference_Titel

Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on

Conference_Location

Shenzhen

Print_ISBN

978-1-4673-5524-7

Type

conf

DOI

10.1109/CGO.2013.6494994

Filename

6494994