• DocumentCode
    692880
  • Title

    Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors

  • Author

    Jongsoo Park ; Bikshandi, Ganesh ; Vaidyanathan, Karthikeyan ; Tang, Ping Tak Peter ; Dubey, Pradeep ; Daehyun Kim

  • Author_Institution
    Parallel Comput. Lab., USA
  • fYear
    2013
  • fDate
    17-22 Nov. 2013
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    This paper demonstrates the first tera-scale performance of Intel® Xeon Phi™ coprocessors on 1D FFT computations. Applying a disciplined performance programming methodology of sound algorithm choice, valid performance model, and well-executed optimizations, we break the tera-flop mark on a mere 64 nodes of Xeon Phi and reach 6.7 TFLOPS with 512 nodes, which is 1.5× than achievable on a same number of Intel® Xeon® nodes. It is a challenge to fully utilize the compute capability presented by many-core wide-vector processors for bandwidth-bound FFT computation. We leverage a new algorithm, Segment-of-Interest FFT, with low inter-node communication cost, and aggressively optimize data movements in node-local computations, exploiting caches. Our coordination of low communication algorithm and massively parallel architecture for scalable performance is not limited to running FFT on Xeon Phi; it can serve as a reference for other bandwidth-bound computations and for emerging HPC systems that are increasingly communication limited.
  • Keywords
    coprocessors; fast Fourier transforms; multiprocessing systems; parallel architectures; HPC systems; Intel Xeon Phi coprocessors; TFLOPS; bandwidth-bound FFT computation; data movement optimization; disciplined performance programming methodology; low communication algorithm; low inter-node communication cost; low-communication algorithm; many-core wide-vector processors; node-local computations; parallel architecture; segment-of-interest FFT; tera-scale 1D FFT; tera-scale performance; Abstracts; Demodulation; Optimization; Program processors; Bandwidth Optimizations; Communication-Avoiding Algorithms; FFT; Wide-Vector Many-Core Processors; Xeon Phi;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
  • Conference_Location
    Denver, CO
  • Print_ISBN
    978-1-4503-2378-9
  • Type

    conf

  • DOI
    10.1145/2503210.2503242
  • Filename
    6877467