DocumentCode
130391
Title
Performance analysis of a scalable algorithm for 3D linear transforms
Author
Lirkov, Ivan ; Paprzycki, Marcin ; Ganzha, Maria ; Sedukhin, Stanislav ; Gepner, Pawel
Author_Institution
Inst. of Inf. & Commun. Technol., Sofia, Bulgaria
fYear
2014
fDate
7-10 Sept. 2014
Firstpage
613
Lastpage
622
Abstract
Practical realizations of 3D forward/inverse separable discrete transforms, such as Fourier transform, cosine/sine transform, etc. are frequently the principal limiters that prevent many practical applications from scaling to a large number of processors. Specifically, existing approaches, which are based primarily on 1D or 2D data decompositions, prevent the 3D transforms from effectively scaling to the maximum (possible / available) number of computer nodes. Recently, a novel, highly scalable, approach to realize forward/inverse 3D transforms has been proposed. It is based on a 3D decomposition of data and geared towards a torus network of computer nodes. The proposed algorithms requires compute-and-roll time-steps, where each step consists of an execution of multiple GEMM operations and concurrent movement of cubical data blocks between nearest-neighbor nodes (directly using the logical arrangements of the nodes within the torus). The proposed 3D orbital algorithms gracefully avoids the, required, 3D data transposition. The aim of this paper is to present a preliminary experimental performance study of the proposed implementation on two different high-performance computer architectures.
Keywords
discrete transforms; inverse transforms; mathematics computing; parallel architectures; 3D data decomposition; 3D data transposition; 3D forward separable discrete transform; 3D inverse separable discrete transform; 3D linear transforms; 3D orbital algorithms; compute-and-roll time-steps; computer nodes; concurrent movement; cubical data blocks; high-performance computer architectures; highly-scalable approach; logical arrangements; maximum computer nodes; multiple GEMM operations; nearest-neighbor nodes; performance analysis; principal limiters; scalable algorithm; torus network; Computers; Discrete Fourier transforms; Discrete cosine transforms; Libraries; Program processors; Three-dimensional displays;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on
Conference_Location
Warsaw
Type
conf
DOI
10.15439/2014F374
Filename
6933071
Link To Document