DocumentCode :
1800921
Title :
Data storage optimization of application-level checkpointing on heterogeneous systems
Author :
Jia Jia ; Wei Song
Author_Institution :
National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, China
fYear :
2013
fDate :
1-8 Jan. 2013
Firstpage :
1
Lastpage :
6
Abstract :
General purpose GPU´s (GPGPU) appearance made it possible that heterogeneous computing can be used by human beings. And it´s also produce a reform for GPU´s general purpose computing and parallel computing. Heterogeneous Systems has been adopted by large-scale of high-performance computers. Nowadays, fault tolerance technique is necessary among these large-scale kinds of scientific computing, but in a few years of GPGPU and heterogeneous system appearance, there is not an effective fault tolerance method come out, therefore, towards this situation, this paper will apply the traditional fault tolerance technique—application-level checkpointing to heterogeneous system. Cause the main solution of reducing overhead of the application-level checkpointing is reducing checkpoint data size, so after analyzing the heterogeneous system and GPGPU program, we propose a method to optimize the data storage of application-level checkpointing technique and validate its optimization by experiments.
Keywords :
Checkpointing; Fault tolerance; Fault tolerant systems; Graphics processing units; Hardware; Kernel; Optimization; application-level checkpointing; fault tolerance method; general purpose GPU; heterogeneous system;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Conference Anthology, IEEE
Conference_Location :
China
Type :
conf
DOI :
10.1109/ANTHOLOGY.2013.6784773
Filename :
6784773
Link To Document :
بازگشت