Title :
Virtual machine based heterogeneous checkpointing
Author :
Agbaria, A. ; Friedman, R.
Author_Institution :
Dept. of Comput. Sci., Technion-Israel Inst. of Technol., Haifa, Israel
Abstract :
Checkpointing an application is the act of saving the application´s state during its execution on stable storage so that if the application fails, it can be restarted from the last saved state, thereby avoiding loss of the work that was already done. A heterogeneous checkpoint/restart mechanism allows to restart an application from a saved state that was taken in a hardware architecture and/or operating system that can be different from those in the machine on which it is restarted. This paper explores how to construct such a mechanism at the virtual machine level. That, is, rather than dumping the entire state of the application process, the mechanism reported here dumps the state of the application w.r.t. a virtual machine. During restart, the saved state is loaded into a new copy of the virtual machine, which continues running from there. The heterogeneous checkpoint/restart mechanism reported here was developed for the OCaml variant of ML. The paper reports on the main issues encountered in building such a mechanism and the design choices made, presents performance evaluations, and discusses some lessons and ideas for extending the work to native code OCaml, and to Java Virtual Machines.
Keywords :
Java; ML language; software fault tolerance; software performance evaluation; system recovery; virtual machines; Java Virtual Machines; ML; OCaml; hardware architecture; heterogeneous checkpointing; operating system; performance evaluation; restart mechanism; virtual machine; Application software; Checkpointing; Computer architecture; Computer science; Hardware; Java; Memory management; Operating systems; Registers; Virtual machining;
Conference_Titel :
Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM
Conference_Location :
Ft. Lauderdale, FL
Print_ISBN :
0-7695-1573-8
DOI :
10.1109/IPDPS.2002.1015495