مرکز منطقه ای اطلاع رساني علوم و فناوري - On the Programmability and Performance of Heterogeneous Platforms

DocumentCode :

2009579

Title :

On the Programmability and Performance of Heterogeneous Platforms

Author :

Krommydas, Konstantinos ; Scogland, Thomas R. W. ; Wu-Chun Feng

Author_Institution :

Dept. of Comput. Sci., Virginia Tech, Blacksburg, VA, USA

fYear :

2013

fDate :

15-18 Dec. 2013

Firstpage :

224

Lastpage :

231

Abstract :

General-purpose computing on an ever-broadening array of parallel devices has led to an increasingly complex and multi-dimensional landscape with respect to programmability and performance optimization. The growing diversity of parallel architectures presents many challenges to the domain scientist, including device selection, programming model, and level of investment in optimization. All of these choices influence the balance between programmability and performance. In this paper, we characterize the performance achievable across a range of optimizations, along with their programmability, for multi- and many-core platforms - specifically, an Intel Sandy Bridge CPU, Intel Xeon Phi co-processor, and NVIDIA Kepler K20 GPU - in the context of an n-body, molecular-modeling application called GEM. Our systematic approach to optimization delivers implementations with speed-ups of 194.98×, 885.18×, and 1020.88× on the CPU, Xeon Phi, and GPU, respectively, over the naive serial version. Beyond the speed-ups, we characterize the incremental optimization of the code from naive serial to fully hand-tuned on each platform through four distinct phases of increasing complexity to expose the strengths and weaknesses of the programming models offered by each platform.

Keywords :

coprocessors; general purpose computers; multiprocessing systems; optimisation; parallel architectures; performance evaluation; GEM; Intel Sandy Bridge CPU; Intel Xeon Phi coprocessor; NVIDIA Kepler K20 GPU; device selection; general-purpose computing; heterogeneous platform performance; incremental optimization; many-core platforms; multicore platform; n-body molecular-modeling application; naive serial; optimization investment level; parallel architectures; parallel devices; performance optimization; programmability; programming models; Computer architecture; Graphics processing units; Mathematical model; Optimization; Performance evaluation; Programming; Vectors; AVX; CUDA; GPU; Intel MIC; NVIDIA Kepler K20; OpenACC; Xeon Phi; optimization; performance; programmability;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Systems (ICPADS), 2013 International Conference on

Conference_Location :

Seoul

ISSN :

1521-9097

Type :

conf

DOI :

10.1109/ICPADS.2013.41

Filename :

6808178

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2009579