DocumentCode :
1925260
Title :
Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework
Author :
Rajachandrasekar, Raghunath ; Jaswani, Jai ; Subramoni, Hari ; Panda, Dhabaleswar K DK
Author_Institution :
Network-Based Comput. Lab., Ohio State Univ., Columbus, OH, USA
fYear :
2012
fDate :
24-28 Sept. 2012
Firstpage :
329
Lastpage :
336
Abstract :
The rapid growth of supercomputing systems, both in scale and complexity, has been accompanied by degradation in system efficiencies. The sheer abundance of resources including millions of cores, vast amounts of physical memory and high-bandwidth networks are heavily under-utilized. This happens when the resources are time-shared amongst parallel applications that are scheduled to run on a subset of compute nodes in an exclusive manner. Several space-sharing techniques that have been proposed in the literature allow parallel applications to be co-located on compute nodes and share resources with each other. Although this leads to better system efficiencies, it also causes contention for system resources. In this work, we specifically address the problem of network contention, caused due to the sharing of network resources by parallel applications and file systems simultaneously. We leverage the Quality-of-Service (QoS) capabilities of the widely used Infini Band interconnect to enhance our data-staging file system, making it QoS-aware. This is a user-level framework that is agnostic of the file system and MPI implementation. Using this file system, we demonstrate the isolation of file system traffic from MPI communication traffic, thereby reducing the network contention. Experimental results show that MPI point-to-point latency can be reduced by up to 320 microseconds, and the bandwidth improved by up to 674MB/s in the presence of contention with I/O traffic. Furthermore, we were able to reduce the runtime of the AWP-ODC MPI application by about 9.89% in the presence of network contention, and also reduce the time spent in communication by the NAS CG kernel by 23.46%.
Keywords :
application program interfaces; file organisation; input-output programs; parallel machines; pattern clustering; processor scheduling; quality of service; AWP-ODC MPI; I/O traffic; InfiniBand cluster; MPI communication traffic; NAS CG kernel; QoS; bit rate 674 Mbit/s; data staging filesystem; data staging framework; network contention minimization; network resource sharing; parallel application; point-to-point latency; quality of service; scheduling; sheer abundance; space sharing technique; supercomputing system; Bandwidth; Fabrics; Kernel; Libraries; Noise; Quality of service; Servers; Data-Staging; InfiniBand; Network Contention and Filesystems; Quality-of-Service; Space-Sharing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2012 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2422-9
Type :
conf
DOI :
10.1109/CLUSTER.2012.90
Filename :
6337795
Link To Document :
بازگشت