• DocumentCode
    2440827
  • Title

    An auto-tuning framework for parallel multicore stencil computations

  • Author

    Kamil, Shoaib ; Chan, Cy ; Oliker, Leonid ; Shalf, John ; Williams, Samuel

  • Author_Institution
    CRD, Lawrence Berkeley Nat. Lab. Berkeley, Berkeley, CA, USA
  • fYear
    2010
  • fDate
    19-23 April 2010
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addition, the large variety of stencil kernels used in practice makes this computation pattern difficult to assemble into a library. This work presents a stencil auto-tuning framework that significantly advances programmer productivity by automatically converting a straightforward sequential Fortran 95 stencil expression into tuned parallel implementations in Fortran, C, or CUDA, thus allowing performance portability across diverse computer architectures, including the AMD Barcelona, Intel Nehalem, Sun Victoria Falls, and the latest NVIDIA GPUs. Results show that our generalized methodology delivers significant performance gains of up to 22× speedup over the reference serial implementation. Overall we demonstrate that such domain-specific auto-tuners hold enormous promise for architectural efficiency, programmer productivity, performance portability, and algorithmic adaptability on existing and emerging multicore systems.
  • Keywords
    FORTRAN; microprocessor chips; parallel architectures; AMD Barcelona; CUDA; Intel Nehalem; NVIDIA GPU; Sun Victoria Falls; algorithmic adaptability; architectural efficiency; architectural resources; computation pattern; computer architectures; domain-specific auto-tuners; multicore systems; parallel implementations; parallel multicore stencil computations; performance portability; programmer productivity; reference serial implementation; sequential Fortran 95 stencil expression; single kernel instantiations; stencil auto-tuning framework; stencil kernels; Assembly; Computer architecture; Concurrent computing; Kernel; Libraries; Multicore processing; Performance gain; Productivity; Programming profession; Sun;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4244-6442-5
  • Type

    conf

  • DOI
    10.1109/IPDPS.2010.5470421
  • Filename
    5470421