Title :
RoltMP-replay of Lamport timestamps for message passing systems
Author :
Ronsse, Michiel A. ; Kranzlmüller, Dieter A.
Author_Institution :
Dept. ELIS, Ghent Univ., Belgium
Abstract :
Debugging nondeterministic parallel programs is rather difficult, because consecutive runs with the same input data may result in different executions. To overcome these problems for cyclic debugging, replay mechanisms based on trace driven simulation have been developed. As replay is based on a previously monitored program run, the overhead generated by the monitoring functionality is rather critical. It has to be small enough in order to keep the intrusion on the program as low as possible. An example of such a replay mechanism with low intrusion is the ROLT method, which was originally developed for shared memory systems. This method uses Lamport clocks to trace the order of accesses to shared objects. Although processes in message passing systems interact completely different, some ideas of ROLT are useful and can be ported to the distributed memory area. As a result an improved monitoring and replay approach with a lower overhead compared to other existing methods can be implemented
Keywords :
discrete event simulation; message passing; parallel programming; performance evaluation; program debugging; shared memory systems; Lamport timestamps; ROLT method; cyclic debugging; message passing systems; nondeterministic parallel programs debugging; replay mechanisms; shared memory systems; trace driven simulation; Clocks; Computer bugs; Concrete; Concurrent computing; Debugging; Message passing; Monitoring; Synchronization; System recovery; Telecommunication traffic;
Conference_Titel :
Parallel and Distributed Processing, 1998. PDP '98. Proceedings of the Sixth Euromicro Workshop on
Conference_Location :
Madrid
Print_ISBN :
0-8186-8332-5
DOI :
10.1109/EMPDP.1998.647184