Title :
The Analysis of Checkpoint Strategies for Large-Scale CFD Simulation in HPC System
Author :
Ren Xiaoguang ; Xu Xinhai ; Tang Yuhua ; Fang Xudong
Author_Institution :
State Key Lab. of High Performance Comput., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
With the development of the electronic technology, the processors count in a supercomputer reaches million scale, making the fault problem becomes a fundamental issue for massive parallel CFD simulation. Checkpoint/Rollback technology is a widely used fault tolerant method, and has a obvious affect for massive parallel application. In this paper, we explore the checkpoint method for the CFD simulation with the CFD simulation features, and analysis the two checkpoint strategies: fine granularity checkpoint and coarse checkpoint. We analysis the checkpoint intervals and the volume of the backup data, and their impact on the FT overhead through model. Experimental results on the Tianhe-2 supercomputer demonstrate that coarse checkpoint can achieve a much better FT effect for the CFD simulation.
Keywords :
checkpointing; computational fluid dynamics; data analysis; fault tolerant computing; parallel machines; program processors; FT overhead; HPC system; Tianhe-2 supercomputer; backup data checkpoint interval analysis; backup data volume analysis; checkpoint strategy analysis; checkpoint technology; coarse checkpoint; electronic technology; fault tolerant method; fine granularity checkpoint; large-scale parallel CFD simulation; processors; rollback technology; Benchmark testing; Computational fluid dynamics; Data models; Equations; Fault tolerance; Fault tolerant systems; Mathematical model; CFD; Checkpoint; Fault tolerant;
Conference_Titel :
Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on
Conference_Location :
Bhopal
Print_ISBN :
978-1-4799-3069-2
DOI :
10.1109/CSNT.2014.224