Title :
The robust middleware approach for transparent and systematic fault tolerance in parallel and distributed systems
Author_Institution :
Dept. of Electr. & Comput. Eng., Queen´´s Univ., Kingston, Ont.
Abstract :
We propose the robust middleware approach to transparent fault tolerance in parallel and distributed systems. The proposed approach inserts a robust middleware between algorithms/programs and system architecture/hardware. With the robust middleware, hardware faults are transparent to algorithms/programs so that ordinary algorithms/programs developed for fault-free networks can run on faulty parallel/distributed systems without modifications. Moreover, the robust middleware automatically adds fault tolerance capability to ordinary algorithms/programs so that no hardware redundancy or reconfiguration capability is required and no assumption is made about the availability of a complete subnetwork (at a lower dimension or smaller size). We also propose nomadic agent multithreaded programming as a novel fault-aware programming paradigm that is independent of network topologies and fault patterns. Nomadic agent multithreaded programming is adaptive to fault/traffic/workload patterns, and can take advantages of various components of the robust middleware, including the fault tolerance features and multiple embeddings, without relying on specialized robust algorithms
Keywords :
fault tolerant computing; middleware; multi-threading; multiprocessing systems; distributed system; fault-aware programming paradigm; fault-free network; middleware; nomadic agent multithreaded programming; parallel system; systematic fault tolerance; transparent fault tolerance; Availability; Computer architecture; Fault tolerance; Fault tolerant systems; Hardware; Middleware; Redundancy; Robustness; Software algorithms; Switches;
Conference_Titel :
Parallel Processing, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
0-7695-2017-0
DOI :
10.1109/ICPP.2003.1240566