DocumentCode :
2589210
Title :
Using Redundant Threads for Fault Tolerance of OpenMP Programs
Author :
Fu, Hongyi ; Ding, Yan
Author_Institution :
Key Lab. of Sci. & Technol. for Nat. Defence of Parallel & Distrib. Process., Nat. Univ. of Defence Tech., Changsha, China
fYear :
2010
fDate :
21-23 April 2010
Firstpage :
1
Lastpage :
8
Abstract :
As the wide application of multi-core processor architecture in the domain of high performance computing, fault tolerance for shared memory parallel programs becomes a hot spot of research. For years, checkpointing has been the dominant fault tolerance technology in this field, and recently, many research works have been engaged with it. However, to those programs which deal with large amount of data, checkpointing may induce massive I/O transfer, which will adversely affect scalability. To deal with such a problem, this paper proposes a fault tolerance approach, making use of redundancy, for shared memory parallel programs. Our scheme avoids saving and restoring computational state during the program´s execution, hence does not involve I/O operations, so presents explicit advantage over checkpointing in scalability. In this paper, we introduce our approach and the related compiler tool in detail, and give the experimental evaluation result.
Keywords :
fault tolerant computing; parallel processing; redundancy; shared memory systems; compiler tool; fault tolerance; multi-core processor architecture; redundant threads; shared memory parallel programs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Applications (ICISA), 2010 International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-5941-4
Electronic_ISBN :
978-1-4244-5943-8
Type :
conf
DOI :
10.1109/ICISA.2010.5480321
Filename :
5480321
Link To Document :
بازگشت