• DocumentCode
    3027561
  • Title

    Fault Tolerant Implementation of Peer-to-peer Distributed Iterative Algorithms

  • Author

    The Tung Nguyen ; El-Baz, Didier

  • Author_Institution
    LAAS, Toulouse, France
  • fYear
    2012
  • fDate
    5-7 Dec. 2012
  • Firstpage
    137
  • Lastpage
    145
  • Abstract
    Fault tolerance issues related to the implementation of distributed iterative algorithms via the P2PDC peer-to-peer distributed computing environment are considered. P2PDC is a decentralized environment dedicated to task parallel applications. It has been designed more particularly for the solution of large scale numerical simulation problems via distributed iterative algorithms. The environment allows frequent and direct communications between peers i.e., machines. P2PDC is based on P2PSAP, a self-adaptive communication protocol. We present new functionalities of P2PDC aimed at making our environment more robust. An adaptive fault tolerance mechanism ensures the robustness of computation to cope with peer faults. We consider also fault tolerance from an algorithmic point of view: we concentrate in particular on distributed asynchronous iterative algorithms that can tolerate some message loss. A series of computational results is presented and analyzed for a numerical simulation problem.
  • Keywords
    iterative methods; parallel processing; peer-to-peer computing; protocols; software fault tolerance; P2PDC peer-to-peer distributed computing environment; P2PSAP; adaptive fault tolerance mechanism; decentralized environment; direct communications; distributed asynchronous iterative algorithms; fault tolerant implementation; file sharing; frequent communications; message loss; numerical simulation problems; parallel applications; peer-to-peer distributed iterative algorithms; peer-to-peer self-adaptive communication protocol; Checkpointing; Fault tolerance; Fault tolerant systems; Iterative methods; Peer to peer computing; Resource management; Topology; distributed computing; fault tolerance; numerical simulation; peer to peer computing; task parallel model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Engineering (CSE), 2012 IEEE 15th International Conference on
  • Conference_Location
    Nicosia
  • Print_ISBN
    978-1-4673-5165-2
  • Electronic_ISBN
    978-0-7695-4914-9
  • Type

    conf

  • DOI
    10.1109/ICCSE.2012.103
  • Filename
    6417286