Author_Institution :
Dept. of Comput. Eng., Nat. Inst. of Technol., Kurukshetra, India
Abstract :
The flexibility offered by mobile agents is quite noticeable in distributed computing environments. However, the greater flexibility of the mobile agent paradigm compared to the client/server computing paradigm comes at an additional threats since agent systems are prone to failures originating from bad communication, security attacks, agent server crashes, system resources unavailability, network congestion, or even deadlock situations. In such events, mobile agents either get lost or damaged (partially or totally) during execution. In this paper, we propose parallel checkpointing approach based on the use of antecedence graphs for providing fault tolerance in mobile agent systems. During normal computation message transmission, the dependency information among mobile agents is recorded in the form of antecedence graphs by participating mobile agents of mobile agent group. When a checkpointing procedure begins, the initiator concurrently informs relevant mobile agents, which minimizes the identifying time. The proposed scheme utilizes the checkpointed information for fault tolerance which is stored in form of antecedence graphs. In case of failures, using checkpointed information, the antecedence graphs and message logs are regenerated for recovery and then normal operation continued. Moreover, compared with the existing schemes, our algorithm involves the minimum number of mobile agents during the identifying and checkpoiting procedure, which leads to the improvement of the system performance. In addition, the proposed algorithm is a domino-free checkpointing algorithm, which is especially desirable for mobile agent systems. Quantitative analysis and experimental simulation show that our algorithm outperforms other coordinated checkpointing schemes in terms of the identifying time and the number of blocked mobile agents and then can provide a better system performance. The main contribution of the proposed checkpointing scheme is the enhancement of graph-based ap- roach in terms of considerable improvement by reducing message overhead, execution, and recovery times.
Keywords :
distributed processing; formal verification; graph theory; mobile agents; parallel processing; software fault tolerance; agent server crashes; antecedence graph approach; bad communication; client/server computing; deadlock situations; distributed computing environments; fault tolerance checkpointing; message logs; message transmission; mobile agent systems; network congestion; parallel checkpointing approach; security attacks; Checkpointing; Fault tolerance; Fault tolerant systems; Mobile agents; Protocols; Servers; Mobile agents; antecedence graphs; checkpointing; failure; fault tolerance; message logs; reliability;