Title :
Scheduling message processing for reducing rollback propagation
Author :
Wang, Y.-M. ; Fuchs, W.K.
Author_Institution :
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
Abstract :
The authors show that the probability of rollback propagation in a message-passing system can often be greatly reduced by reordering the processing of messages. Also, rollback propagation was measured for several parallel programs. A scheduling algorithm for message processing and its implementation for reducing rollback propagation are described. The algorithm incorporates a user-transparent prioritized scheme based on the run-time communication and checkpointing history. Communication trace-driven simulation for several parallel programs written in the Chare Kernel language demonstrated that the probability of rollback propagation can be reduced at the cost of slight additional performance degradation.<>
Keywords :
fault tolerant computing; message passing; parallel programming; scheduling; Chare Kernel language; checkpointing; message processing scheduling; parallel programs; rollback propagation; run-time communication; trace-driven simulation; user-transparent prioritized scheme; Checkpointing; Costs; Degradation; History; Kernel; NASA; Processor scheduling; Runtime; Scheduling algorithm; Synchronization;
Conference_Titel :
Fault-Tolerant Computing, 1992. FTCS-22. Digest of Papers., Twenty-Second International Symposium on
Conference_Location :
Boston, MA, USA
Print_ISBN :
0-8186-2875-8
DOI :
10.1109/FTCS.1992.243599