DocumentCode
3027561
Title
Fault Tolerant Implementation of Peer-to-peer Distributed Iterative Algorithms
Author
The Tung Nguyen ; El-Baz, Didier
Author_Institution
LAAS, Toulouse, France
fYear
2012
fDate
5-7 Dec. 2012
Firstpage
137
Lastpage
145
Abstract
Fault tolerance issues related to the implementation of distributed iterative algorithms via the P2PDC peer-to-peer distributed computing environment are considered. P2PDC is a decentralized environment dedicated to task parallel applications. It has been designed more particularly for the solution of large scale numerical simulation problems via distributed iterative algorithms. The environment allows frequent and direct communications between peers i.e., machines. P2PDC is based on P2PSAP, a self-adaptive communication protocol. We present new functionalities of P2PDC aimed at making our environment more robust. An adaptive fault tolerance mechanism ensures the robustness of computation to cope with peer faults. We consider also fault tolerance from an algorithmic point of view: we concentrate in particular on distributed asynchronous iterative algorithms that can tolerate some message loss. A series of computational results is presented and analyzed for a numerical simulation problem.
Keywords
iterative methods; parallel processing; peer-to-peer computing; protocols; software fault tolerance; P2PDC peer-to-peer distributed computing environment; P2PSAP; adaptive fault tolerance mechanism; decentralized environment; direct communications; distributed asynchronous iterative algorithms; fault tolerant implementation; file sharing; frequent communications; message loss; numerical simulation problems; parallel applications; peer-to-peer distributed iterative algorithms; peer-to-peer self-adaptive communication protocol; Checkpointing; Fault tolerance; Fault tolerant systems; Iterative methods; Peer to peer computing; Resource management; Topology; distributed computing; fault tolerance; numerical simulation; peer to peer computing; task parallel model;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Science and Engineering (CSE), 2012 IEEE 15th International Conference on
Conference_Location
Nicosia
Print_ISBN
978-1-4673-5165-2
Electronic_ISBN
978-0-7695-4914-9
Type
conf
DOI
10.1109/ICCSE.2012.103
Filename
6417286
Link To Document