• DocumentCode
    1921428
  • Title

    Application performance characterization and analysis on Blue Gene/Q

  • Author

    Walkup, Bob

  • fYear
    2012
  • fDate
    10-16 Nov. 2012
  • Firstpage
    2247
  • Lastpage
    2280
  • Abstract
    This article consists of a collection of slides from the author´s conference presentation. The author concludes that The Blue Gene/Q design, low-power simple cores, four hardware threads per core, resu lts in high instruction throughput, and thus exceptional power efficiency for applications. Can effectively fill in pipeline stalls and hide latencies in the memory subsystem. The consequence is low performance per thread, so a high degree of parallelization is required for high application performance. Traditional programming methods (MPI, OpenMP, Pthreads) hold up at very large scales. Memory costs can limit scaling when there are data-structures with size linear in the number of processes, threading helps by keeping the number of processes manageable. Detailed performance analysis is viable at > 10^6 processes but requires care. On-the-fly performance data reduction has merits.
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
  • Conference_Location
    Salt Lake City, UT
  • Print_ISBN
    978-1-4673-6218-4
  • Type

    conf

  • DOI
    10.1109/SC.Companion.2012.358
  • Filename
    6496143