Title :
High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design
Author :
Jose, Jithin ; Hamidouche, Khaled ; Xiaoyi Lu ; Potluri, Sreeram ; Jie Zhang ; Tomko, Karen ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
Intel Many Integrated Core (MIC) architectures are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. Partitioned Global Address Space (PGAS) programming models, such as OpenSHMEM, provide an attractive approach for developing scientific applications with irregular communication characteristics, by abstracting shared memory address space, along with one-sided communication semantics. However, the current OpenSHMEM standard does not efficiently support heterogeneous memory architectures such as Xeon Phi. Host and Xeon Phi cores have different memory capacities and compute characteristics. But, the global symmetric memory allocation in the current OpenSHMEM standard mandates that same amount of memory be allocated on every process. In this paper, we propose extensions to overcome this restriction and propose high performance runtime-level designs for efficient communication involving Xeon Phi processors. Further, we re-design applications to demonstrate the effectiveness of the proposed designs and extensions. Experimental evaluations indicate 4X to 7X reduction in OpenSHMEM data movement operation latencies, and 6X to 11X improvement in performance for collective operations. Application evaluations in symmetric mode indicate performance improvements of 28% at 1,024 processes. Further, application redesigns using the proposed extensions provide several magnitudes of performance improvement, as compared to the symmetric mode. To the best of our knowledge, this is the first research work that proposes high performance runtime designs for OpenSHMEM on Intel Xeon Phi clusters.
Keywords :
memory architecture; microprocessor chips; multiprocessing systems; parallel processing; shared memory systems; Host cores; Intel Xeon Phi clusters; Intel many integrated core; MIC architectures; OpenSHMEM standard; PGAS programming models; Xeon Phi cores; Xeon Phi processors; application codesign; compute characteristics; data movement operation latencies; global symmetric memory allocation; heterogeneous memory architectures; high compute density; high performance OpenSHMEM; high performance runtime-level designs; irregular communication characteristics; memory capacities; modern supercomputer architectures; one-sided communication semantics; partitioned global address space; runtime designs; scientific applications; shared memory address space; Bandwidth; Coprocessors; Electronics packaging; Memory management; Resource management; Runtime;
Conference_Titel :
Cluster Computing (CLUSTER), 2014 IEEE International Conference on
Conference_Location :
Madrid
DOI :
10.1109/CLUSTER.2014.6968754