• DocumentCode
    2852662
  • Title

    Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems

  • Author

    Wu, Xingfu ; Taylor, Valerie ; Lively, Charles ; Sharkawi, Sameh

  • Author_Institution
    Dept. of Comput. Sci., Texas A&M Univ., College Station, TX
  • fYear
    2008
  • fDate
    8-12 Sept. 2008
  • Firstpage
    188
  • Lastpage
    195
  • Abstract
    Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific applications. In this paper, we quantify the performance gap resulting from using different number of processors per node; this information is used to provide a baseline for the amount of optimization needed when using all processors per node on CMP clusters. We conduct detailed performance analysis to identify how applications can be modified to efficiently utilize all processors per node on CMP clusters, especially focusing on two scientific applications: a 3D particle-in-cell, magnetic fusion application gyrokinetic toroidal code (GTC) and a lattice Boltzmann method for simulating fluid dynamics (LBM). In terms of refinements, we use conventional techniques such as cache blocking, loop unrolling and loop fusion, and develop hybrid methods for optimizing MPI_Allreduce and MPI_Reduce. Using these optimizations, the application performance for utilizing all processors per node was improved by up to 18.97% for GTC and 15.77% for LBM on up to 2048 total processors on the CMP clusters.
  • Keywords
    microprocessor chips; multiprocessing systems; parallel processing; workstation clusters; CMP cluster system; MPI_Allreduce; MPI_Reduce; cache blocking; chip multiprocessors; fluid dynamics simulation; gyrokinetic toroidal code; high performance computing; lattice Boltzmann method; loop fusion; loop unrolling; parallel scientific application; Application software; Computer science; Fluid dynamics; Large-scale systems; Magnetohydrodynamics; Optimization methods; Parallel processing; Performance analysis; Toroidal magnetic fields; US Department of Energy; Performance Analysis; Performance Optimization; chip multiprocessors; cluster system; parallel scientific applications;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing - Workshops, 2008. ICPP-W '08. International Conference on
  • Conference_Location
    Portland, OR
  • ISSN
    1530-2016
  • Print_ISBN
    978-0-7695-3375-9
  • Electronic_ISBN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPP-W.2008.21
  • Filename
    4626800