DocumentCode
3664235
Title
Understanding Performance Portability of OpenACC for Supercomputers
Author
Suttinee Sawadsitang;James Lin;Simon See;Francois Bodin;Satoshi Matsuoka
Author_Institution
Shanghai Jiao Tong Univ., Shanghai, China
fYear
2015
fDate
5/1/2015 12:00:00 AM
Firstpage
699
Lastpage
707
Abstract
Scientific applications need to be moved among supercomputers, such as Tianhe-2 and TSUBAME 2.5. OpenACC provides a directive-based approach for a single source code base with function portability across different accelerators used in the supercomputers. However, the performance portability is not guaranteed by the OpenACC standard. Therefore, we propose a systematic optimization method, instead of auto-tuning by compliers, to achieve reasonable portable performance with minor code modifications. With this method, we evaluate the four kernels from Rodin a benchmark suite and one mini-application Hydro on our hybrid "CPU+GPU+MIC" supercomputer À with the CAPS and PGI compilers. We analyze Parallel Thread Execution (PTX) codes to further understand the performance portability, and find CAPS adopts a different strategy from PGI in thread distribution. The evaluation results show the optimized OpenACC versions can archive a better performance portability ratio than the OpenCL version in some cases. The understanding and the method are valuable for OpenACC application developers to efficiently and correctly use the available OpenACC compilers.
Keywords
"Graphics processing units","Microwave integrated circuits","Optimization","Kernel","Supercomputers","Instruction sets","Standards"
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International
Type
conf
DOI
10.1109/IPDPSW.2015.60
Filename
7284377
Link To Document