• DocumentCode
    2589210
  • Title

    Using Redundant Threads for Fault Tolerance of OpenMP Programs

  • Author

    Fu, Hongyi ; Ding, Yan

  • Author_Institution
    Key Lab. of Sci. & Technol. for Nat. Defence of Parallel & Distrib. Process., Nat. Univ. of Defence Tech., Changsha, China
  • fYear
    2010
  • fDate
    21-23 April 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    As the wide application of multi-core processor architecture in the domain of high performance computing, fault tolerance for shared memory parallel programs becomes a hot spot of research. For years, checkpointing has been the dominant fault tolerance technology in this field, and recently, many research works have been engaged with it. However, to those programs which deal with large amount of data, checkpointing may induce massive I/O transfer, which will adversely affect scalability. To deal with such a problem, this paper proposes a fault tolerance approach, making use of redundancy, for shared memory parallel programs. Our scheme avoids saving and restoring computational state during the program´s execution, hence does not involve I/O operations, so presents explicit advantage over checkpointing in scalability. In this paper, we introduce our approach and the related compiler tool in detail, and give the experimental evaluation result.
  • Keywords
    fault tolerant computing; parallel processing; redundancy; shared memory systems; compiler tool; fault tolerance; multi-core processor architecture; redundant threads; shared memory parallel programs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Applications (ICISA), 2010 International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4244-5941-4
  • Electronic_ISBN
    978-1-4244-5943-8
  • Type

    conf

  • DOI
    10.1109/ICISA.2010.5480321
  • Filename
    5480321