Title :
Intelligent Agents for Fault Tolerance: From Multi-agent Simulation to Cluster-Based Implementation
Author :
Varghese, Blesson ; McKee, Gerard ; Alexandrov, Vassil
Author_Institution :
Sch. of Syst. Eng., Univ. of Reading, Reading, UK
Abstract :
Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely ´Intelligent Agents´. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.
Keywords :
message passing; multi-agent systems; parallel algorithms; pattern clustering; software fault tolerance; FPGA; abstracted hardware layer; cluster-based implementation; fault tolerance; intelligent agents; message passing interface; multiagent simulator; multiagent systems; parallel computing system; parallel reduction algorithm; swarm array computing approach; Computational modeling; Computer simulation; Fault tolerance; Fault tolerant systems; Field programmable gate arrays; Hardware; Intelligent agent; Large-scale systems; Multiagent systems; Parallel processing; cluster-based implementation; fault tolerance; intelligent agents; swarm-array computing;
Conference_Titel :
Advanced Information Networking and Applications Workshops (WAINA), 2010 IEEE 24th International Conference on
Conference_Location :
Perth, WA
Print_ISBN :
978-1-4244-6701-3
DOI :
10.1109/WAINA.2010.21