DocumentCode :
2906809
Title :
Reflex Barrier: A Scalable Network-Based Synchronization Barrier
Author :
Anbar, Ahmad ; Serres, Olivier ; El-Ghazawi, Tarek
Author_Institution :
Dept. of Electr. & Comput. Eng., George Washington Univ., Washington, DC, USA
fYear :
2011
fDate :
7-9 Dec. 2011
Firstpage :
204
Lastpage :
211
Abstract :
High-performance computing is witnessing the proliferation of multi-core processors in parallel architectures, and the trend is expected to increase further with the emerging many-core technology, leading to hundreds of processing cores within each compute node in the near future. Along side with this trend, it is also clear that total number of cores within the whole system is increasing. To be able to harvest the fruits of this massive parallelism, inter-process synchronization and communication should be as lightweight as they can be, and should be relying on as limited involvement as possible of the participating processors/cores. The synchronization algorithms that target shared memory processors are expected not to be able to scale on many-cores as they rely on atomics, locks, and/or cache coherence protocols, which all should be very costly operations on many-cores. In the same time, some many core architectures provide user space networks on chip (NoCs) that operate similar to regular networks. In this paper, we are introducing the Reflex barrier, a new synchronization barrier algorithm that relies on fundamental networking concepts. As the barrier relies on the characteristics of the network, it requires very little intervention from the participating processors/cores. The algorithm can also be implemented as split phase, which furnish an opportunity to reduce the synchronization cost. We implemented the algorithm using Unified Parallel C (UPC), MPI and pThreads. We tested our implementation on TILE64, a 64-core processor. The performance of the Reflex barrier is also analyzed and compared to other algorithms using performance models.
Keywords :
application program interfaces; cache storage; message passing; network-on-chip; parallel architectures; shared memory systems; synchronisation; MPI; TILE64; Unified Parallel C; cache coherence protocol; high-performance computing; inter-process synchronization; many core architecture; many-core technology; multicore processor; networks on chip; pThreads; parallel architecture; reflex barrier; scalable network-based synchronization barrier; shared memory processor; split phase; synchronization cost reduction; Coherence; Computer architecture; Message systems; Program processors; Routing; Synchronization; Tiles; Distributed memory barrier; Many-core clusters; Manycores; Reflex barrier; Synchronization barrier;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on
Conference_Location :
Tainan
ISSN :
1521-9097
Print_ISBN :
978-1-4577-1875-5
Type :
conf
DOI :
10.1109/ICPADS.2011.106
Filename :
6121279
Link To Document :
بازگشت