DocumentCode :
3245928
Title :
Understanding Network Saturation Behavior on Large-Scale Blue Gene/P Systems
Author :
Balaji, P. ; Naik, H. ; Desai, N.
Author_Institution :
Math. & Comput. Sci. Div., Argonne Nat. Lab., Argonne, IL, USA
fYear :
2009
fDate :
8-11 Dec. 2009
Firstpage :
586
Lastpage :
593
Abstract :
As researchers continue to architect massive-scale systems, it is becoming clear that these systems will utilize a significant amount of shared hardware between processing units. Systems such as the IBM Blue Gene (BG) and Cray XT have started utilizing flat (i.e., scalable) networks, which differ from switched fabrics in that they use a 3D torus or similar topology. This allows the network to grow only linearly with system scale, instead of the super linear growth needed for full fat-tree switched topologies, but at the cost of increased network sharing between processing nodes. While in many cases a full fat-tree is an over estimate of the needed bisectional bandwidth, it is not clear whether the other extreme of a flat topology is sufficient to move data around the network efficiently. In this paper, we study the network behavior of the IBM BG/P using several application communication kernels, and we monitor network congestion behavior based on detailed hardware counters. Our studies scale from small systems to 8 racks (32,768 cores) of BG/P and provide insights into the network communication characteristics of the system.
Keywords :
Cray computers; computer network management; large-scale systems; parallel machines; telecommunication congestion control; telecommunication network topology; 3D torus; Cray XT; IBM Blue Gene; fat-tree switched topologies; large-scale Blue Gene/P systems; massive-scale systems; network congestion; network saturation behavior; network sharing; similar topology; Bandwidth; Communication switching; Costs; Counting circuits; Fabrics; Hardware; Kernel; Large-scale systems; Monitoring; Network topology; Blue Gene/P; Fat Tree; Petascale; Saturation; Torus;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on
Conference_Location :
Shenzhen
ISSN :
1521-9097
Print_ISBN :
978-1-4244-5788-5
Type :
conf
DOI :
10.1109/ICPADS.2009.117
Filename :
5395352
Link To Document :
بازگشت