DocumentCode
2906809
Title
Reflex Barrier: A Scalable Network-Based Synchronization Barrier
Author
Anbar, Ahmad ; Serres, Olivier ; El-Ghazawi, Tarek
Author_Institution
Dept. of Electr. & Comput. Eng., George Washington Univ., Washington, DC, USA
fYear
2011
fDate
7-9 Dec. 2011
Firstpage
204
Lastpage
211
Abstract
High-performance computing is witnessing the proliferation of multi-core processors in parallel architectures, and the trend is expected to increase further with the emerging many-core technology, leading to hundreds of processing cores within each compute node in the near future. Along side with this trend, it is also clear that total number of cores within the whole system is increasing. To be able to harvest the fruits of this massive parallelism, inter-process synchronization and communication should be as lightweight as they can be, and should be relying on as limited involvement as possible of the participating processors/cores. The synchronization algorithms that target shared memory processors are expected not to be able to scale on many-cores as they rely on atomics, locks, and/or cache coherence protocols, which all should be very costly operations on many-cores. In the same time, some many core architectures provide user space networks on chip (NoCs) that operate similar to regular networks. In this paper, we are introducing the Reflex barrier, a new synchronization barrier algorithm that relies on fundamental networking concepts. As the barrier relies on the characteristics of the network, it requires very little intervention from the participating processors/cores. The algorithm can also be implemented as split phase, which furnish an opportunity to reduce the synchronization cost. We implemented the algorithm using Unified Parallel C (UPC), MPI and pThreads. We tested our implementation on TILE64, a 64-core processor. The performance of the Reflex barrier is also analyzed and compared to other algorithms using performance models.
Keywords
application program interfaces; cache storage; message passing; network-on-chip; parallel architectures; shared memory systems; synchronisation; MPI; TILE64; Unified Parallel C; cache coherence protocol; high-performance computing; inter-process synchronization; many core architecture; many-core technology; multicore processor; networks on chip; pThreads; parallel architecture; reflex barrier; scalable network-based synchronization barrier; shared memory processor; split phase; synchronization cost reduction; Coherence; Computer architecture; Message systems; Program processors; Routing; Synchronization; Tiles; Distributed memory barrier; Many-core clusters; Manycores; Reflex barrier; Synchronization barrier;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on
Conference_Location
Tainan
ISSN
1521-9097
Print_ISBN
978-1-4577-1875-5
Type
conf
DOI
10.1109/ICPADS.2011.106
Filename
6121279
Link To Document