Title :
Multiple coherence and coordinated checkpointing protocols for DSM systems
Author :
Boukerche, Azzedine ; Melo, Alba Cristina M A ; Koch, Jeferson G. ; Galdino, Cicero R.
Author_Institution :
Sch. of Inf. Technol. & Eng., Ottawa Univ., Ont., Canada
Abstract :
In this article, we address two important issues in DSM research: improving performance and providing reliability. To improve performance, we designed a low-overhead multiple coherence protocol mechanism and to augment the reliability of the system, we propose a coordinated checkpointing/recovery mechanism. Both mechanisms were implemented and incorporated in JIAJIA, a DSM system that implements scope consistency with a write-invalidate protocol. Our results on an eight machine cluster with some popular benchmarks show, for the multiple coherence protocol strategy, a significant reduction on the number of messages exchanged, leading to better performance results. Also, our results for the checkpointing strategy show that the overhead introduced in failure-free executions is small when considering the benefits obtained.
Keywords :
checkpointing; distributed shared memory systems; message passing; protocols; workstation clusters; DSM systems; JIAJIA; checkpointing-recovery mechanism; computer cluster; coordinated checkpointing protocols; failure-free executions; messages exchange; multiple coherence protocol; reliability; write-invalidate protocol; Access protocols; Checkpointing; Coherence; Computer science; Fault tolerant systems; Information technology; Memory management; Parallel programming; Programming profession; Support vector machines;
Conference_Titel :
Parallel Processing, 2005. ICPP 2005 Workshops. International Conference Workshops on
Print_ISBN :
0-7695-2381-1
DOI :
10.1109/ICPPW.2005.57