• DocumentCode
    720556
  • Title

    Modeling Gather and Scatter with Hardware Performance Counters for Xeon Phi

  • Author

    Lin, James ; Nukada, Akira ; Matsuoka, Satoshi

  • Author_Institution
    Shanghai Jiao Tong Univ., China
  • fYear
    2015
  • fDate
    4-7 May 2015
  • Firstpage
    713
  • Lastpage
    716
  • Abstract
    Intel Initial Many-Core Instructions (IMCI) for Xeon Phi introduces hardware-implemented Gather and Scatter (G/S) load/store contents of SIMD registers from/to non-contiguous memory locations. However, they can be one of key performance bottlenecks for Xeon Phi. Modelling G/S can provide insights to the performance on Xeon Phi, however, the existing solution needs a hand-written assembly implementation. Therefore, we modeled G/S with hardware performance counters which can be profiled by the tools like PAPI. We profiled Address Generation Interlock (AGI) events as the number of G/S, estimated the average latency of G/S with VPU_DATA_READ, and combined them to model the total latencies of G/S. We applied our model to the 3D 7-point stencil and the result showed G/S spent nearly 40% of total kernel time. We also validated the model by implementing a G/S- free version with intrinsics. The contribution of the work is a performance model for G/S built with hardware counters. We believe the model can be generally applicable to CPU as well.
  • Keywords
    coprocessors; flip-flops; parallel processing; storage allocation; 3D 7-point stencil; AGI events; G/S-free version; Intel Xeon Phi; SIMD registers; VPU_DATA_READ; address generation interlock; average G/S latency; hardware performance counters; hardware-implemented gather and scatter load-store contents; noncontiguous memory locations; Analytical models; Hardware; Kernel; Mathematical model; Radiation detectors; Solid modeling; Three-dimensional displays; Gather and Scatter; Hardware performance counters; Performance modeling; Xeon Phi;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
  • Conference_Location
    Shenzhen
  • Type

    conf

  • DOI
    10.1109/CCGrid.2015.59
  • Filename
    7152539