Title :
Orthros: A High-Reliability Operating System with Transmigration of Processes
Author :
Yoshida, Kenta ; Saito, Sakuyoshi ; Mouri, Kousuke ; Matsuo, Hiroshi
Author_Institution :
Nagoya Inst. of Technol., Nagoya, Japan
Abstract :
We propose a method to solve problems that accompany recovering from operating system (OS) failures. First, to reduce recovery time, we make two OSes run simultaneously and configure them as an active-backup structure in one computer. This structure can provide a fast recovery from failures by a failover. Recovery time when using the proposed method is about 0.4 seconds at a minimum and up to about 10 seconds even if 2 GB memory is restored. Next, for smooth continuation of services after recovery, the proposed method preserves processes, their network connections, and file caches, and does not have runtime overhead to obtain a process execution status from the running active OS before a crash. In addition, the resources consumed to build the active-backup structure are only one CPU core and a small amount of memory. The hardware required to implement the proposed method is a multi-core processor and one disk for each OS, consequently, introduction of the proposed method incurs low cost. In the evaluation, we confirmed that the downtime was up to about 1.5 seconds when the active OS of the proposed system crashed while running a text editor, an NFS server, and a database server.
Keywords :
cache storage; multiprocessing systems; operating systems (computers); system recovery; transport protocols; CPU core; NFS server; OS failure recovery; OSes; Orthros; active-backup structure; database server; downtime; file caches; high-reliability operating system; multicore processor; network connections; operating system failure recovery time reduction; process execution status; process transmigration; resource consumption; running active OS; service continuation; system crashing; text editor; Computer bugs; Computers; Hardware; Kernel; Runtime; file cache migration; operating system; process migration; recovery;
Conference_Titel :
Dependable Computing (PRDC), 2013 IEEE 19th Pacific Rim International Symposium on
Conference_Location :
Vancouver, BC
DOI :
10.1109/PRDC.2013.54