DocumentCode
2635039
Title
Efficient communication using total-exchange
Author
Rao, Satish ; Suel, Torsten ; Tsantilas, Thanasis ; Goudreau, Mark
Author_Institution
NEC Res. Inst., Princeton, NJ, USA
fYear
1995
fDate
25-28 Apr 1995
Firstpage
544
Lastpage
550
Abstract
A central question in parallel computing is to determine the extent to which one can write parallel programs using a high-level, general-purpose, and architecture-independent programming language and have them executed on a variety of parallel and distributed architectures without sacrificing efficiency. A large body of research suggests that, at least in theory, general-purpose parallel computing is indeed possible provided certain conditions are met: an excess of logical parallelism in the program, and the ability of the target architecture to efficiently realize balanced communication patterns. The canonical example of a balanced communication pattern is an h-relation, in which each processor is the origin and destination of at most h messages. A plethora of protocols has been designed for routing h-relations in a variety of networks. The goal has been to minimize the value of h while guaranteeing delivery of the messages within a time constant factor from optimal. In this paper we describe protocols that meet the most stringent efficiency requirement, namely delivery of messages within time that is a lower order additive term from the best achievable. Such protocols are called 1-optimal. While these protocols achieve 1-optimality only for heavily loaded networks, that is, for large values of h, they are remarkable for their simplicity in that they only use the total-exchange communication primitive. The total-exchange can be realized in many networks using very simple, contention-free, and extremely efficient schemes. The technical contribution of this paper is a protocol to route random h-relations in an N-processor network using h/N(1+o(1))+O(log log N) total-exchange rounds with high probability. Using message duplication, we can improve the bound to h/N(1+o(1))+O(log*N). This improves upon the h/N(1+o(1))+O(log N) bound of Gerbessiotis and Valiant. While our theoretical improvements are modest, our experimental results show an improvement over the protocol of A. Gerebessiotis and L.G. Valiant
Keywords
parallel processing; parallel programming; protocols; N-processor network; architecture-independent programming language; balanced communication patterns; h-relation; logical parallelism; parallel computing; parallel programs; protocols; Computer architecture; Computer languages; Computer science; Concurrent computing; Context; National electric code; Optical computing; Parallel processing; Phase change random access memory; Routing protocols;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing Symposium, 1995. Proceedings., 9th International
Conference_Location
Santa Barbara, CA
Print_ISBN
0-8186-7074-6
Type
conf
DOI
10.1109/IPPS.1995.395984
Filename
395984
Link To Document