Title :
Designing Topology-Aware Communication Schedules for Alltoall Operations in Large InfiniBand Clusters
Author :
Subramoni, Hari ; Kandalla, Krishna ; Jose, Jithin ; Tomko, Karen ; Schulz, K. ; Pekurovsky, Dmitry ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
Network contention is a significant factor affecting the performance of communication intensive operations like All to all exchanges used for transpose operations of multi-dimensional FFTs on modern supercomputing systems. Over the last decade InfiniBand has become anincreasingly popular interconnect for deploying these systems. However, no practical schemes exist that allow the users of these systems to perform these communication operations in a network-to-pology-aware manner. In this paper we propose multiple schemes to create network topology-aware communication schedules for All to all FFT operations that reduce the volume of contention encountered by the operations. Through careful study and analysis of communication performance we derive critical factors that result in network contention in large scale InfiniBand clusters. We propose enhancements to our topology discovery service to generate the path matrix in a scalable and efficient manner. Through our techniques, we are able to significantly reduce the amount of network contention observed during the Alltoall / FFT operations. The results of our experimental evaluation indicate that our proposed technique is able to deliver up to a 12% improvement in the communication time of P3DFFT at 4,096 processes.
Keywords :
computer network performance evaluation; fast Fourier transforms; message passing; parallel machines; scheduling; telecommunication network topology; workstation clusters; FFT; InfiniBand clusters; MPI_All to all operations; communication performance analysis; fast Fourier transform; message passing; network contention; network topology-aware communication schedules; supercomputing systems; Algorithm design and analysis; Libraries; Network topology; Radiation detectors; Resource management; Schedules; Topology; Alltoall; Clusters; InfiniBand; Topology;
Conference_Titel :
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location :
Minneapolis MN
DOI :
10.1109/ICPP.2014.32