Tera-scale 1D FFT with low-communication algorithm and Intel^® Xeon Phi™ coprocessors

Author

Jongsoo Park ; Bikshandi, Ganesh ; Vaidyanathan, Karthikeyan ; Tang, Ping Tak Peter ; Dubey, Pradeep ; Daehyun Kim

Author_Institution

Parallel Comput. Lab., USA

fYear

2013

fDate

17-22 Nov. 2013

Firstpage

1

Lastpage

12

Abstract

This paper demonstrates the first tera-scale performance of Intel^® Xeon Phi™ coprocessors on 1D FFT computations. Applying a disciplined performance programming methodology of sound algorithm choice, valid performance model, and well-executed optimizations, we break the tera-flop mark on a mere 64 nodes of Xeon Phi and reach 6.7 TFLOPS with 512 nodes, which is 1.5× than achievable on a same number of Intel^® Xeon^® nodes. It is a challenge to fully utilize the compute capability presented by many-core wide-vector processors for bandwidth-bound FFT computation. We leverage a new algorithm, Segment-of-Interest FFT, with low inter-node communication cost, and aggressively optimize data movements in node-local computations, exploiting caches. Our coordination of low communication algorithm and massively parallel architecture for scalable performance is not limited to running FFT on Xeon Phi; it can serve as a reference for other bandwidth-bound computations and for emerging HPC systems that are increasingly communication limited.

Keywords

coprocessors; fast Fourier transforms; multiprocessing systems; parallel architectures; HPC systems; Intel Xeon Phi coprocessors; TFLOPS; bandwidth-bound FFT computation; data movement optimization; disciplined performance programming methodology; low communication algorithm; low inter-node communication cost; low-communication algorithm; many-core wide-vector processors; node-local computations; parallel architecture; segment-of-interest FFT; tera-scale 1D FFT; tera-scale performance; Abstracts; Demodulation; Optimization; Program processors; Bandwidth Optimizations; Communication-Avoiding Algorithms; FFT; Wide-Vector Many-Core Processors; Xeon Phi;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for

Conference_Location

Denver, CO

Print_ISBN

978-1-4503-2378-9

Type

conf

DOI

10.1145/2503210.2503242

Filename

6877467

Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors

Jongsoo Park ; Bikshandi, Ganesh ; Vaidyanathan, Karthikeyan ; Tang, Ping Tak Peter ; Dubey, Pradeep ; Daehyun Kim

conf

Tera-scale 1D FFT with low-communication algorithm and Intel^® Xeon Phi™ coprocessors