DocumentCode :
3025189
Title :
Performance of fault-tolerant distributed shared memory on broadcast- and switch-based architectures
Author :
Katsinis, Constantine
Author_Institution :
Dept. of Electr. Comput. Eng.,, Drexel Univ., Philadelphia, PA, USA
fYear :
2005
fDate :
4-8 April 2005
Abstract :
This paper presents a set of distributed-shared-memory protocols that provide fault tolerance on broadcast-based and switch-based architectures with no decrease in performance. These augmented DSM protocols combine the data duplication required by fault tolerance with the data duplication that naturally results in distributed-shared-memory implementations. The recovery memory at each backup node is continuously maintained consistent and is accessible by all processes executing at the backup node. Simulation results show that the additional data duplication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Data blocks which are duplicated to maintain the recovery memory are also utilized by the DSM protocol, reducing network traffic, and increasing the processor utilization significantly. We use simulation and multiprocessor address trace files to compare the performance of a broadcast architecture called the SOME-Bus to the performance of two representative switch architectures.
Keywords :
checkpointing; distributed shared memory systems; fault tolerance; parallel architectures; DSM protocol; broadcast-based architecture; checkpoint creation; distributed shared memory; fault tolerance; memory recovery; multiprocessor address trace files; network traffic; switch-based architecture; system performance; Access protocols; Broadcasting; Fault tolerance; Fault tolerant systems; Multiprocessor interconnection networks; Network topology; Operating systems; Routing; Switches; Telecommunication traffic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International
Print_ISBN :
0-7695-2312-9
Type :
conf
DOI :
10.1109/IPDPS.2005.340
Filename :
1420209
Link To Document :
بازگشت