Title :
Scalable collective communication on the ASCI Q machine
Author :
Petrini, Fabrizio ; Fernandez, Juan ; Frachtenberg, Eitan ; Coll, Salvador
Author_Institution :
Comput. & Computational Sci. (CCS) Div., Los Alamos Nat. Lab., NM, USA
Abstract :
Scientific codes spend a considerable part of their run time executing collective communication operations. Such operations can also be critical for efficient resource management in large-scale machines. Therefore, scalable collective communication is a key factor to achieve good performance in large-scale parallel computers. In this paper we describe the performance and scalability of some common collective communication patterns on the ASCI Q machine. Experimental results conducted on a 1024-node/4096-processor segment show that the network is fast and scalable. The network is able to barrier-synchronize in a few tens of μs, perform a broadcast with an aggregate bandwidth of more than 100 GB/s and sustain heavy hot-spot traffic with a limited performance degradation.
Keywords :
multiprocessor interconnection networks; network topology; parallel machines; performance evaluation; synchronisation; 100 GB/s; ASCI Q machine; cluster processors; hot-spot traffic; large-scale parallel computers; network barrier-synchronization; network broadcast aggregate bandwidth; network topology; performance evaluation; resource management; scalable collective communication; scientific codes; supercomputer interconnection network; Application specific integrated circuits; Broadcasting; Communication switching; Laboratories; Large-scale systems; Network topology; Resource management; Scalability; Storms; Switches;
Conference_Titel :
High Performance Interconnects, 2003. Proceedings. 11th Symposium on
Print_ISBN :
0-7695-2012-X
DOI :
10.1109/CONECT.2003.1231478