Title :
Fault Tolerant Parallel FFT Using Parallel Failure Recovery
Author :
Fu, Hongyi ; Yang, Xuejun
Author_Institution :
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Tech., Changsha, China
fDate :
June 29 2009-July 2 2009
Abstract :
This paper introduces a new method based on parallel failure recovery, for the fault tolerance issue of parallel programs. In case a process fails, other surviving processes will compute the task of the failed one in parallel, so that the overhead for fault tolerance is leveled down. The paper presents the design and implementation of the parallel FFT using the new approach, and works on finding an optimum number of processes that participate in parallel failure recovery. Finally, an experiment is done to show the better performance of the parallel failure recovery over that of checkpointing, and to show the effectiveness of our solution for the best number of processes participating parallel failure recovery.
Keywords :
checkpointing; fast Fourier transforms; fault tolerant computing; parallel programming; checkpointing; fault tolerance; fault tolerant parallel FFT; parallel failure recovery; parallel program; Aerospace industry; Application software; Books; Computational geometry; Computer networks; Conferences; Fault tolerance; Grid computing; High performance computing; Physics computing;
Conference_Titel :
Computational Science and Its Applications, 2009. ICCSA '09. International Conference on
Conference_Location :
Yongin
Print_ISBN :
978-0-7695-3701-6
DOI :
10.1109/ICCSA.2009.36