• DocumentCode
    190739
  • Title

    Can we put concurrency back into redundant multithreading?

  • Author

    Dobe, Bjorn ; Hartig, Hermann

  • Author_Institution
    Tech. Univ. Dresden, Dresden, Germany
  • fYear
    2014
  • fDate
    12-17 Oct. 2014
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    Software-implemented fault tolerance (SIFT) mechanisms allow to tolerate transient hardware faults in commercial off-the-shelf (COTS) systems without using specialized resilient hardware. Unfortunately, existing SIFT methods at both the compiler and the operating system levels are often restricted to single-threaded applications and hence do not apply to multithreaded software on modern multicore platforms. We present RomainMT, an operating system service that provides replication for unmodified multithreaded applications. Replicating these programs is challenging, because scheduling-induced non-determinism may cause replicated threads to execute different valid code paths. This complicates the distinction between valid behavior and the effects of hardware errors. RomainMT solves these problems by transparently making multithreaded execution deterministic. We present two alternative mechanisms that differ in the assumptions made about the respective applications and investigate their performance implications. Our evaluation using the SPLASH2 benchmark suite shows that the overhead for triple-modular redundancy (TMR) is 24% for applications with two application threads and 65% for four application threads.
  • Keywords
    concurrency (computers); multi-threading; multiprocessing systems; operating systems (computers); program compilers; software fault tolerance; COTS systems; RomainMT; SIFT mechanisms; SIFT methods; SPLASH2 benchmark suite; TMR; code paths; commercial off-the-shelf systems; concurrency; hardware errors; multicore platforms; multithreaded execution deterministic; multithreaded software; operating system service; redundant multithreading; scheduling-induced nondeterminism; single-threaded applications; software-implemented fault tolerance mechanisms; transient hardware fault tolerance; triple-modular redundancy; Benchmark testing; Fault tolerance; Hardware; Instruction sets; Libraries; Multithreading; Synchronization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Embedded Software (EMSOFT), 2014 International Conference on
  • Conference_Location
    Jaypee Greens
  • Type

    conf

  • DOI
    10.1145/2656045.2656050
  • Filename
    6986127