DocumentCode :
2552362
Title :
Collective operations for wide-area message passing systems using adaptive spanning trees
Author :
Saito, Hideo ; Taura, Kenjiro ; Chikayama, Takashi
Author_Institution :
Tokyo Univ., Japan
fYear :
2005
fDate :
13-14 Nov. 2005
Abstract :
We propose a method for wide-area message passing systems to perform collective operations using dynamically created spanning trees. In our proposal, broadcasts and reductions are performed efficiently using topology-aware spanning trees constructed at run-time; processors autonomously measure latency and bandwidth to create latency-aware trees for short messages and bandwidth-aware trees for long messages. Our spanning trees adapt to topology changes due to the joining or leaving of processors; when processors join or leave a computation, processors repair the spanning trees so that effective execution of collective operations can continue. With 128 to 201 processors distributed over 3 to 4 clusters, the latency of our broadcast was within a factor of 2 of a static topology-aware implementation, and our broadcast achieved 82 percent of the bandwidth of a static topology-aware implementation. Moreover, when some processors joined or left a computation, our broadcast temporarily performed poorly for about 8 seconds while the spanning trees adapted to the new topology, but completed successfully even during this time.
Keywords :
message passing; trees (mathematics); adaptive spanning trees; bandwidth measurement; collective operations; latency measurement; latency-aware trees; topology-aware spanning trees; wide-area message passing systems; Adaptive systems; Bandwidth; Broadcasting; Computer networks; Delay; High performance computing; Large-scale systems; Message passing; Network topology; Proposals;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Grid Computing, 2005. The 6th IEEE/ACM International Workshop on
Print_ISBN :
0-7803-9492-5
Type :
conf
DOI :
10.1109/GRID.2005.1542722
Filename :
1542722
Link To Document :
بازگشت