DocumentCode
2589210
Title
Using Redundant Threads for Fault Tolerance of OpenMP Programs
Author
Fu, Hongyi ; Ding, Yan
Author_Institution
Key Lab. of Sci. & Technol. for Nat. Defence of Parallel & Distrib. Process., Nat. Univ. of Defence Tech., Changsha, China
fYear
2010
fDate
21-23 April 2010
Firstpage
1
Lastpage
8
Abstract
As the wide application of multi-core processor architecture in the domain of high performance computing, fault tolerance for shared memory parallel programs becomes a hot spot of research. For years, checkpointing has been the dominant fault tolerance technology in this field, and recently, many research works have been engaged with it. However, to those programs which deal with large amount of data, checkpointing may induce massive I/O transfer, which will adversely affect scalability. To deal with such a problem, this paper proposes a fault tolerance approach, making use of redundancy, for shared memory parallel programs. Our scheme avoids saving and restoring computational state during the program´s execution, hence does not involve I/O operations, so presents explicit advantage over checkpointing in scalability. In this paper, we introduce our approach and the related compiler tool in detail, and give the experimental evaluation result.
Keywords
fault tolerant computing; parallel processing; redundancy; shared memory systems; compiler tool; fault tolerance; multi-core processor architecture; redundant threads; shared memory parallel programs;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science and Applications (ICISA), 2010 International Conference on
Conference_Location
Seoul
Print_ISBN
978-1-4244-5941-4
Electronic_ISBN
978-1-4244-5943-8
Type
conf
DOI
10.1109/ICISA.2010.5480321
Filename
5480321
Link To Document