• DocumentCode
    1783374
  • Title

    Generalizing Run-Time Tiling with the Loop Chain Abstraction

  • Author

    Strout, Michelle Mills ; Luporini, Fabio ; Krieger, Christopher D. ; Bertolli, Carlo ; Bercea, Gheorghe-Teodor ; Olschanowsky, Catherine ; Ramanujam, J. ; Kelly, Paul H. J.

  • Author_Institution
    Colorado State Univ., Fort Collins, CO, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    1136
  • Lastpage
    1145
  • Abstract
    Many scientific applications are organized in a data parallel way: as sequences of parallel and/or reduction loops. This exposes parallelism well, but does not convert data reuse between loops into data locality. This paper focuses on this issue in parallel loops whose loop-to-loop dependence structure is data-dependent due to indirect references such as A[B[i]]. Such references are a common occurrence in sparse matrix computations, molecular dynamics simulations, and unstructured-mesh computational fluid dynamics (CFD). Previously, sparse tiling approaches were developed for individual benchmarks to group iterations across such loops to improve data locality. These approaches were shown to benefit applications such as moldyn, Gauss-Seidel, and the sparse matrix powers kernel, however the run-time routines for performing sparse tiling were hand coded per application. In this paper, we present a generalized full sparse tiling algorithm that uses the newly developed loop chain abstraction as input, improves inter-loop data locality, and creates a task graph to expose shared-memory parallelism at runtime. We evaluate the overhead and performance impact of the generalized full sparse tiling algorithm on two codes: a sparse Jacobi iterative solver and the Airfoil CFD benchmark.
  • Keywords
    abstract data types; computational fluid dynamics; mathematics computing; parallel programming; program control structures; software performance evaluation; Airfoil CFD benchmark; generalized full sparse tiling algorithm; inter-loop data locality improvement; loop chain abstraction; loop-to-loop dependence structure; overhead impact evaluation; parallel data organization; parallel loop sequence; performance impact evaluation; reduction loop sequence; run-time routines; run-time tiling generalization; scientific applications; shared-memory parallelism; sparse Jacobi iterative solver; task graph; Arrays; Benchmark testing; Indexes; Jacobian matrices; Parallel processing; Sparse matrices; inspector/executor; run-time reordering transformations; tiling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.118
  • Filename
    6877342