مرکز منطقه ای اطلاع رساني علوم و فناوري - Reflex Barrier: A Scalable Network-Based Synchronization Barrier

DocumentCode :

2906809

Title :

Reflex Barrier: A Scalable Network-Based Synchronization Barrier

Author :

Anbar, Ahmad ; Serres, Olivier ; El-Ghazawi, Tarek

Author_Institution :

Dept. of Electr. & Comput. Eng., George Washington Univ., Washington, DC, USA

fYear :

2011

fDate :

7-9 Dec. 2011

Firstpage :

204

Lastpage :

211

Abstract :

High-performance computing is witnessing the proliferation of multi-core processors in parallel architectures, and the trend is expected to increase further with the emerging many-core technology, leading to hundreds of processing cores within each compute node in the near future. Along side with this trend, it is also clear that total number of cores within the whole system is increasing. To be able to harvest the fruits of this massive parallelism, inter-process synchronization and communication should be as lightweight as they can be, and should be relying on as limited involvement as possible of the participating processors/cores. The synchronization algorithms that target shared memory processors are expected not to be able to scale on many-cores as they rely on atomics, locks, and/or cache coherence protocols, which all should be very costly operations on many-cores. In the same time, some many core architectures provide user space networks on chip (NoCs) that operate similar to regular networks. In this paper, we are introducing the Reflex barrier, a new synchronization barrier algorithm that relies on fundamental networking concepts. As the barrier relies on the characteristics of the network, it requires very little intervention from the participating processors/cores. The algorithm can also be implemented as split phase, which furnish an opportunity to reduce the synchronization cost. We implemented the algorithm using Unified Parallel C (UPC), MPI and pThreads. We tested our implementation on TILE64, a 64-core processor. The performance of the Reflex barrier is also analyzed and compared to other algorithms using performance models.

Keywords :

application program interfaces; cache storage; message passing; network-on-chip; parallel architectures; shared memory systems; synchronisation; MPI; TILE64; Unified Parallel C; cache coherence protocol; high-performance computing; inter-process synchronization; many core architecture; many-core technology; multicore processor; networks on chip; pThreads; parallel architecture; reflex barrier; scalable network-based synchronization barrier; shared memory processor; split phase; synchronization cost reduction; Coherence; Computer architecture; Message systems; Program processors; Routing; Synchronization; Tiles; Distributed memory barrier; Many-core clusters; Manycores; Reflex barrier; Synchronization barrier;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on

Conference_Location :

Tainan

ISSN :

1521-9097

Print_ISBN :

978-1-4577-1875-5

Type :

conf

DOI :

10.1109/ICPADS.2011.106

Filename :

6121279

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2906809