DocumentCode
3245928
Title
Understanding Network Saturation Behavior on Large-Scale Blue Gene/P Systems
Author
Balaji, P. ; Naik, H. ; Desai, N.
Author_Institution
Math. & Comput. Sci. Div., Argonne Nat. Lab., Argonne, IL, USA
fYear
2009
fDate
8-11 Dec. 2009
Firstpage
586
Lastpage
593
Abstract
As researchers continue to architect massive-scale systems, it is becoming clear that these systems will utilize a significant amount of shared hardware between processing units. Systems such as the IBM Blue Gene (BG) and Cray XT have started utilizing flat (i.e., scalable) networks, which differ from switched fabrics in that they use a 3D torus or similar topology. This allows the network to grow only linearly with system scale, instead of the super linear growth needed for full fat-tree switched topologies, but at the cost of increased network sharing between processing nodes. While in many cases a full fat-tree is an over estimate of the needed bisectional bandwidth, it is not clear whether the other extreme of a flat topology is sufficient to move data around the network efficiently. In this paper, we study the network behavior of the IBM BG/P using several application communication kernels, and we monitor network congestion behavior based on detailed hardware counters. Our studies scale from small systems to 8 racks (32,768 cores) of BG/P and provide insights into the network communication characteristics of the system.
Keywords
Cray computers; computer network management; large-scale systems; parallel machines; telecommunication congestion control; telecommunication network topology; 3D torus; Cray XT; IBM Blue Gene; fat-tree switched topologies; large-scale Blue Gene/P systems; massive-scale systems; network congestion; network saturation behavior; network sharing; similar topology; Bandwidth; Communication switching; Costs; Counting circuits; Fabrics; Hardware; Kernel; Large-scale systems; Monitoring; Network topology; Blue Gene/P; Fat Tree; Petascale; Saturation; Torus;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on
Conference_Location
Shenzhen
ISSN
1521-9097
Print_ISBN
978-1-4244-5788-5
Type
conf
DOI
10.1109/ICPADS.2009.117
Filename
5395352
Link To Document