DocumentCode
190739
Title
Can we put concurrency back into redundant multithreading?
Author
Dobe, Bjorn ; Hartig, Hermann
Author_Institution
Tech. Univ. Dresden, Dresden, Germany
fYear
2014
fDate
12-17 Oct. 2014
Firstpage
1
Lastpage
10
Abstract
Software-implemented fault tolerance (SIFT) mechanisms allow to tolerate transient hardware faults in commercial off-the-shelf (COTS) systems without using specialized resilient hardware. Unfortunately, existing SIFT methods at both the compiler and the operating system levels are often restricted to single-threaded applications and hence do not apply to multithreaded software on modern multicore platforms. We present RomainMT, an operating system service that provides replication for unmodified multithreaded applications. Replicating these programs is challenging, because scheduling-induced non-determinism may cause replicated threads to execute different valid code paths. This complicates the distinction between valid behavior and the effects of hardware errors. RomainMT solves these problems by transparently making multithreaded execution deterministic. We present two alternative mechanisms that differ in the assumptions made about the respective applications and investigate their performance implications. Our evaluation using the SPLASH2 benchmark suite shows that the overhead for triple-modular redundancy (TMR) is 24% for applications with two application threads and 65% for four application threads.
Keywords
concurrency (computers); multi-threading; multiprocessing systems; operating systems (computers); program compilers; software fault tolerance; COTS systems; RomainMT; SIFT mechanisms; SIFT methods; SPLASH2 benchmark suite; TMR; code paths; commercial off-the-shelf systems; concurrency; hardware errors; multicore platforms; multithreaded execution deterministic; multithreaded software; operating system service; redundant multithreading; scheduling-induced nondeterminism; single-threaded applications; software-implemented fault tolerance mechanisms; transient hardware fault tolerance; triple-modular redundancy; Benchmark testing; Fault tolerance; Hardware; Instruction sets; Libraries; Multithreading; Synchronization;
fLanguage
English
Publisher
ieee
Conference_Titel
Embedded Software (EMSOFT), 2014 International Conference on
Conference_Location
Jaypee Greens
Type
conf
DOI
10.1145/2656045.2656050
Filename
6986127
Link To Document