مرکز منطقه ای اطلاع رساني علوم و فناوري - Revisiting the Double Checkpointing Algorithm

DocumentCode :

1995941

Title :

Revisiting the Double Checkpointing Algorithm

Author :

Dongarra, Jack ; Herault, Thomas ; Robert, Yannick

Author_Institution :

Univ. of Tennessee, Knoxville, TN, USA

fYear :

2013

fDate :

20-24 May 2013

Firstpage :

706

Lastpage :

715

Abstract :

Fast check pointing algorithms require distributed access to stable storage. This paper revisits the approach base upon double check pointing, and compares the blocking algorithm of Zheng, Shi and Kalé, with the non-blocking algorithm of Ni, Meneses and Kalé, in terms of both performance and risk. We also extend their model proposed to assess the impact of the overhead associated to non-blocking communications. We then provide a new peer-to-peer check pointing algorithm, called the triple check pointing algorithm, that can work at constant memory, and achieves both higher efficiency and better risk handling than the double check pointing algorithm. We provide performance and risk models for all the evaluated protocols, and compare them through comprehensive simulations.

Keywords :

parallel processing; blocking algorithm; distributed access; double checkpointing algorithm; fast check pointing algorithms; nonblocking communications; parallel computing environments; peer-to-peer check pointing algorithm; Algorithm design and analysis; Checkpointing; Computational modeling; Equations; Peer-to-peer computing; Protocols; Reliability; checkpoint; in-memory checkpoint; performance model; scheduling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International

Conference_Location :

Cambridge, MA

Print_ISBN :

978-0-7695-4979-8

Type :

conf

DOI :

10.1109/IPDPSW.2013.11

Filename :

6650947

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1995941