Title :
Methods to utilize SIMT and SIMD instruction level parallelism in tridiagonal solvers
Author :
Laszlo, Endre ; Giles, Michael B. ; Appleyard, Jeremy ; Szolgay, Peter
Author_Institution :
Univ. of Oxford, Oxford, UK
Abstract :
The most widely used parallel architectures in today´s High Performance Computing systems utilize multi-core CPUs, many-core GPUs or Intel´s MIC (Many Integrated Core). The effort of new algorithm and implementation development greatly influences the performance on these architectures, and the differences between their underlying ILP parallelism - namely SIMT (Single Instruction Multiple Thread) and SIMD (Single Instruction Multiple Data) - require different approaches. The aim of the work to be presented is to show how high performance can be achieved in solving multiple scalar- and block-tridiagonal system of equations. The Thomas algorithm is implemented on all three hardware platforms, and for the GPU we also implement a hybrid algorithm based on Parallel Cyclic Reduction and Thomas algorithm for solving scalar problem and a thread level, work-sharing based algorithm for block-tridiagonal problems. Performance comparisons and a discussion on efficiency are also included.
Keywords :
graphics processing units; mathematics computing; multi-threading; parallel architectures; ILP parallelism; Intel MIC; SIMD; SIMT; Thomas algorithm; block-tridiagonal system; high-performance computing systems; hybrid algorithm; many integrated core; many-core GPU; multicore CPU; parallel architectures; parallel cyclic reduction; scalar-block-tridiagonal system; single instruction multiple data; single instruction multiple thread; Educational institutions; Equations; Graphics processing units; Instruction sets; Parallel processing; Registers; Vectors;
Conference_Titel :
Cellular Nanoscale Networks and their Applications (CNNA), 2014 14th International Workshop on
Conference_Location :
Notre Dame, IN
DOI :
10.1109/CNNA.2014.6888600