Abstract :
The authors examine the design, implementation, and experimental analysis of parallel priority queues for device and network simulation. They consider: 1) distributed splay trees using MPI; 2) concurrent heaps using shared memory atomic locks; and 3) a new, more general concurrent data structure based on distributed sorted lists, designed to provide dynamically balanced work allocation and efficient use of shared memory resources. We evaluate performance for all three data structures on a Cray-TSESOO system at KFA-Julich. Our comparisons are based on simulations of single buffers and a 64×64 packet switch which supports multicasting. In all implementations, PEs monitor traffic at their preassigned input/output ports, while priority queue elements are distributed across the Cray-TBE virtual shared memory. Our experiments with up to 60000 packets and two to 64 PEs indicate that concurrent priority queues perform much better than distributed ones. Both concurrent implementations have comparable performance, while our new data structure uses less memory and has been further optimized. We also consider parallel simulation for symmetric networks by sorting integer conflict functions and implementing a packet indexing scheme. The optimized message passing network simulator can process ~500 K packet moves in one second, with an efficiency that exceeds ~50 percent for a few thousand packets on the Cray-T3E with 32 PEs. All developed data structures form a parallel library. Although our concurrent implementations use the Cray-TSE ShMem library, portability can be derived from Open-MP or MP1-2 standard libraries, which will provide support for one-way communication and shared memory lock mechanisms
Keywords :
Cray computers; abstract data types; application program interfaces; digital simulation; message passing; packet switching; parallel programming; queueing theory; shared memory systems; software libraries; sorting; virtual storage; Cray-TBE virtual shared memory; Cray-TSE ShMem library; Cray-TSESOO system; MPI; MPI-2 standard libraries; Open-MP; concurrent data structure; concurrent heaps; concurrent implementations; concurrent priority queues; data structure; distributed sorted lists; distributed splay trees; dynamically balanced work allocation; integer conflict functions; multicasting; network simulation; optimized message passing network simulator; packet indexing scheme; packet switch; parallel library; parallel priority queues; parallel simulation; preassigned input/output ports; priority queue elements; priority queues; shared memory atomic locks; shared memory lock mechanisms; shared memory resources; single buffers; sorting methods; symmetric networks; Analytical models; Data structures; Libraries; Packet switching; Queueing analysis; Resource management; Sorting; Switches; Traffic control; Tree data structures;