• DocumentCode
    3664235
  • Title

    Understanding Performance Portability of OpenACC for Supercomputers

  • Author

    Suttinee Sawadsitang;James Lin;Simon See;Francois Bodin;Satoshi Matsuoka

  • Author_Institution
    Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2015
  • fDate
    5/1/2015 12:00:00 AM
  • Firstpage
    699
  • Lastpage
    707
  • Abstract
    Scientific applications need to be moved among supercomputers, such as Tianhe-2 and TSUBAME 2.5. OpenACC provides a directive-based approach for a single source code base with function portability across different accelerators used in the supercomputers. However, the performance portability is not guaranteed by the OpenACC standard. Therefore, we propose a systematic optimization method, instead of auto-tuning by compliers, to achieve reasonable portable performance with minor code modifications. With this method, we evaluate the four kernels from Rodin a benchmark suite and one mini-application Hydro on our hybrid "CPU+GPU+MIC" supercomputer À with the CAPS and PGI compilers. We analyze Parallel Thread Execution (PTX) codes to further understand the performance portability, and find CAPS adopts a different strategy from PGI in thread distribution. The evaluation results show the optimized OpenACC versions can archive a better performance portability ratio than the OpenCL version in some cases. The understanding and the method are valuable for OpenACC application developers to efficiently and correctly use the available OpenACC compilers.
  • Keywords
    "Graphics processing units","Microwave integrated circuits","Optimization","Kernel","Supercomputers","Instruction sets","Standards"
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2015.60
  • Filename
    7284377