DocumentCode :
3740640
Title :
Efficient Barrier Implementation on the POWER8 Processor
Author :
C. D. Sudheer;Ashok Srinivasan
Author_Institution :
IBM Res., New Delhi, India
fYear :
2015
Firstpage :
165
Lastpage :
173
Abstract :
POWER8 is a new generation of POWER processor capable of 8-way simultaneous multi-threading per core. High-performance computing capabilities, such as high amount of instruction-level and thread level parallelism, are integrated with a deep memory hierarchy. Fine-grained parallel applications running on such architectures often rely on an efficient barrier implementation for synchronization. We present a variety of barrier implementations for a 4-chip POWER8 node. These implementations are optimized based on a careful study of the POWER8 memory sub-system. Our best implementation yields one to two orders of magnitude lower time than the current MPI and POSIX threads based barrier implementations on POWER8. Apart from providing efficient barrier implementations, an additional significance of this work lies in demonstrating how certain features of the memory subsystem, such as NUMA access to remote L3 cache and the impact of prefetching, can be used to design efficient primitives on the POWER8.
Keywords :
"Prefetching","Sockets","Servers","Computer architecture","Optimization","Bandwidth","Benchmark testing"
Publisher :
ieee
Conference_Titel :
High Performance Computing (HiPC), 2015 IEEE 22nd International Conference on
Type :
conf
DOI :
10.1109/HiPC.2015.51
Filename :
7397630
Link To Document :
بازگشت