• DocumentCode
    3207336
  • Title

    Supporting highly-decoupled thread-level redundancy for parallel programs

  • Author

    Rashid, M. Wasiur ; Huang, Michael C.

  • Author_Institution
    Dept. of Electr.&Comput. Eng., Univ. of Rochester, Rochester, RI
  • fYear
    2008
  • fDate
    16-20 Feb. 2008
  • Firstpage
    393
  • Lastpage
    404
  • Abstract
    The continued scaling of device dimensions and the operating voltage reduces the critical charge and thus natural noise tolerance level of transistors. As a result, circuits can produce transient upsets that corrupt program execution and data. Redundant execution can detect and correct circuit errors on the fly. The increasing prevalence of multi-core architectures makes coarse-grain thread-level redundancy (TLR) very attractive. While TLR has been extensively studied in the context of single-threaded applications, much less attention is paid to the design issues and tradeoffs of supporting parallel codes. In this paper, we propose a microarchitecture to efficiently support TLR for parallel codes. One of the main design goals is to support a large number of unverified instructions, so that long latencies in verification can be easily tolerated. Another important objective is to have a comprehensive coverage that includes not only the computation logic but also the coherence and consistency logic in the memory subsystem. Hence, the redundant copy of the program needs to independently access the memory and the system needs to efficiently manage the non-determinism in parallel execution. The proposed architectural support to achieve these goals is entirely off the processor critical path and can be easily disabled when redundancy is not requested. The design, with a few effective optimizations, is also efficient in that during error-free execution, it causes less than 3% additional performance degradation on top of throughput loss due to redundancy.
  • Keywords
    microprocessor chips; multi-threading; power aware computing; redundancy; transistors; coarse-grain thread-level redundancy; coherence logic; computation logic; consistency logic; natural noise tolerance level; parallel programs; thread-level redundancy; transistors; Circuit noise; Delay; Error correction; Logic; Memory management; Microarchitecture; Noise level; Noise reduction; Redundancy; Voltage;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on
  • Conference_Location
    Salt Lake City, UT
  • ISSN
    1530-0897
  • Print_ISBN
    978-1-4244-2070-4
  • Type

    conf

  • DOI
    10.1109/HPCA.2008.4658655
  • Filename
    4658655