Title :
Exploiting data-flow for fault-tolerance in a wide-area parallel system
Author :
Nguyen-Tuong, Anh ; Grimshaw, Andrew S. ; Hyett, Mark
Author_Institution :
Dept. of Comput. Sci., Virginia Univ., Charlottesville, VA, USA
Abstract :
Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will be a common occurrence. Unfortunately, most parallel processing systems have not been designed with fault-tolerance in mind. Mentat is a high-performance object-oriented parallel processing system that is based on an extension of the data-flow model. The functional nature of data-flow enables both parallelism and fault-tolerance. In this paper, we exploit the data-flow underpinning of Mentat to provide easy-to-use and transparent fault-tolerance. We present results on both a small-scale network and a wide-area heterogeneous environment that consists of three sites: the National Center for Supercomputing Applications, the University of Virginia and the NASA Langley Research Center
Keywords :
data flow computing; fault tolerant computing; multiprocessing systems; object-oriented programming; wide area networks; Mentat; NASA Langley Research Center; National Center for Supercomputing Applications; Virginia University; data-flow model; high-performance object-oriented parallel processing system; host failures; small-scale network; transparent fault tolerance; wide-area heterogeneous environment; wide-area parallel processing systems; Application software; Bandwidth; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; Parallel processing; Software prototyping; Space technology; World Wide Web;
Conference_Titel :
Reliable Distributed Systems, 1996. Proceedings., 15th Symposium on
Conference_Location :
Nigara-on-the-Lake, Ont.
Print_ISBN :
0-8186-7481-4
DOI :
10.1109/RELDIS.1996.559687