DocumentCode :
2044731
Title :
Using SCTP to hide latency in MPI programs
Author :
Kamal, H. ; Penoff, B. ; Tsai, M. ; Vong, E. ; Wagner, A.
Author_Institution :
Dept. of Comput. Sci., British Columbia Univ., Vancouver, BC
fYear :
2006
fDate :
25-29 April 2006
Abstract :
A difficulty in using heterogeneous collections of geographically distributed machines across wide area networks for parallel computing is the huge variability in message latency that is orders of magnitude larger than parallel programs executing on dedicated systems. This variability is in part due to the underlying network bandwidth and latency which can vary dramatically according to network conditions. Although such an environment is not suitable for many message passing programs there are those programs that can take advantage of it. Using SCTP (Stream Control Transmission Protocol) for MPI, we show how to reduce the effect of latency on task farm programs to allow them to effectively execute in high latency environments. SCTP is a recently standardized transport level protocol that has a number of features that make it well-suited to MPI and our goal is to reduce the effect of latency on MPI programs in wide area networks. We take advantage of SCTP´s improved congestion control as well as its ability to have multiple independent message streams over a single connection to eliminate the head of line blocking that can occur in TCP-based middleware. The use of streams required a novel use of MPI tags to identify independent streams rather than different types of messages. We describe the design of a task farm template that exploits streams, uses buffering and pipelining of task requests to improve its performance under network loss and variable latency. We use these techniques to improve the performance of two real-world MPI programs: a robust correlation matrix computation and mpiBLAST
Keywords :
message passing; parallel processing; transport protocols; wide area networks; MPI programs; SCTP; Stream Control Transmission Protocol; TCP-based middleware; congestion control; correlation matrix computation; distributed machines; latency hiding; message latency; message passing programs; mpiBLAST; network bandwidth; parallel computing; task farm programs; task farm template design; transport level protocol; wide area networks; Bandwidth; Delay; Message passing; Middleware; Parallel processing; Performance loss; Pipeline processing; Robustness; Transport protocols; Wide area networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Conference_Location :
Rhodes Island
Print_ISBN :
1-4244-0054-6
Type :
conf
DOI :
10.1109/IPDPS.2006.1639391
Filename :
1639391
Link To Document :
بازگشت