Title :
Selective checkpointing and rollbacks in multi-threaded object-oriented environment
Author :
Kasbekar, Mangesh ; Narayanan, Chandramouli ; Das, Chita R.
Author_Institution :
Pennsylvania State Univ., University Park, PA, USA
fDate :
12/1/1999 12:00:00 AM
Abstract :
This paper presents selective checkpointing and rollback schemes for MT-OO (multithreaded, object-oriented) programs. There is a need for checkpointing mechanisms that are more sophisticated than the traditional process-level checkpointing. The program model, theoretical foundations, and an implementation of the selective checkpointing and rollback schemes are described. The usefulness of the schemes is demonstrated by implementing a higher level fault-tolerance scheme of conversations using them. The performance implications are studied on a prototype Internet e-commerce-server. The use of the selective schemes in the prototype server showed an appreciable reduction in the loss of work in the presence of faults. Benefits are more pronounced for a larger level of concurrency in the server. The selective scheme usually outperforms the hypothetical zero-cost global scheme in the presence of faults, vis-a-vis completion times. The experiments also show the vast difference between the sizes of selective checkpoints and global checkpoints. The concurrent sessions scheme (based on the concept of relaxed conversations) required 160 checkpoints in less than an hour. Traditionally, such a scheme would be considered outrageous, but the selective schemes still improve performance in the presence of faults
Keywords :
multi-threading; object-oriented programming; software fault tolerance; Internet e-commerce-server; completion time; fault tolerant software; global checkpoints; higher level fault-tolerance scheme; multi-threaded object-oriented environment; performance implications; program model; rollbacks; selective checkpointing; theoretical foundations; Checkpointing; Computer crashes; Fault tolerance; Libraries; Object oriented modeling; Object oriented programming; Prototypes; Runtime; Web server; Yarn;
Journal_Title :
Reliability, IEEE Transactions on