DocumentCode :
3558948
Title :
Efficient and Scalable Hardware-Based Multicast in Fat-Tree Networks
Author :
Coll, Salvador ; Mora, Francisco J. ; Duato, Jose ; Petrini, Fabrizio
Author_Institution :
Dept. tie Ing. Electron., Univ. Politec. de Valencia, Valencia, Spain
Volume :
20
Issue :
9
fYear :
2009
Firstpage :
1285
Lastpage :
1298
Abstract :
This article presents an efficient and scalable mechanism to overcome the limitations of collective communication in switched interconnection networks in the presence of faults. Considering that current trends in supercomputing are moving toward massively parallel computers, with many thousands of components, reliability becomes a challenge. In such scenario, fat-tree networks that provide hardware support for collective communication suffer from serious performance degradation due to the presence of, even, a single faulty node. This paper describes a new mechanism to provide high-performance collective communication in such situations. The feasibility of the proposed technique is formally demonstrated. We present the design of a new hardware-based routing algorithm for multicast, that is at the base of our proposal. The proposed mechanism is implemented and experimentally evaluated. Our experimental results show that hardware-based multicast trees provide an efficient and scalable solution for collective communication in fat-tree networks, significantly outperforming traditional solutions.
Keywords :
fault tolerance; fault trees; multicast communication; multiprocessor interconnection networks; network routing; parallel machines; fat-tree network; hardware-based routing algorithm design; high-performance collective communication; interprocessor communication; massively parallel computer; performance degradation; reliability issue; scalable hardware-based multicast tree; supercomputing trend; switched interconnection network fault; Algorithm design and analysis; Communication switching; Computer network reliability; Concurrent computing; Degradation; Hardware; Multicast algorithms; Multiprocessor interconnection networks; Routing; Telecommunication network reliability; Data communications; High-speed; Interprocessor communications; Multicast; Network topology; Routing protocols; data communications; interprocessor communications; network communication; network problems; trees.;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
Conference_Location :
10/17/2008 12:00:00 AM
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2008.228
Filename :
4653483
Link To Document :
بازگشت