Title :
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
Author :
Karamcheti, Vijay ; Chien, Andrew A.
Author_Institution :
Dept. of Comput. Sci., Illinois Univ., Urbana, IL, USA
Abstract :
Programming models based on messaging continue to be an important programming model for parallel machines. Messaging costs are strongly influenced by a machine\´s network interface architecture. We examine the impact of architectural support for messaging in two machines-the TMC CM-5 and the Cray T3D-by exploring the design and performance of several messaging implementations. The additional features in the T3D support remote operations: memory access, fetch-and-increment, atomic swaps, and prefetch. Experiments on the CM-5 show that requiring processor involvement for message reception can increase the communication overheads from 60% to 300% for moderate variations in computation grain size at the destination. In contrast, the T3D hardware for remote operations decouples message reception from processor activity, producing high-performance messaging independent of computation grain size or variability. In addition, hardware support for a shared address space in the T3D can be used to solve the output contention problem (output hot spots), producing messaging implementations that are robust over a wide variety of traffic patterns. Atomic swap hardware can be used to build a distributed message queue, enabling a "pull" messaging scheme where the destination requests data transfer upon receive. This scheme uses prefetches to mask receive latency. While this yields performance robust over output contention, its base cost is competitive only for small messages (up to 64 bytes) because of the high cost of issuing and resolving prefetches in the T3D. Emulation shows that if the interaction costs can be reduced by a factor of eight (250 ns to 3 1ns), perhaps by moving the prefetch queue on chip, and there is a corresponding increase in the prefetch queue size, the pull scheme can give superior performance in all cases.
Keywords :
computer architecture; parallel machines; performance evaluation; programming; Cray T3D; TMC CM-5; architectural support; atomic swap hardware; atomic swaps; distributed message queue; fetch-and-increment; hardware support; memory access; messaging; parallel machines; prefetch; programming model; receive latency; remote operations; shared address space; traffic patterns; Costs; Delay; Grain size; Hardware; Network interfaces; Parallel machines; Parallel programming; Prefetching; Robustness; Traffic control;
Conference_Titel :
Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on
Conference_Location :
Santa Margherita Ligure, Italy
Print_ISBN :
0-89791-698-0