DocumentCode
1565387
Title
Multiple coherence and coordinated checkpointing protocols for DSM systems
Author
Boukerche, Azzedine ; Melo, Alba Cristina M A ; Koch, Jeferson G. ; Galdino, Cicero R.
Author_Institution
Sch. of Inf. Technol. & Eng., Ottawa Univ., Ont., Canada
fYear
2005
Firstpage
531
Lastpage
538
Abstract
In this article, we address two important issues in DSM research: improving performance and providing reliability. To improve performance, we designed a low-overhead multiple coherence protocol mechanism and to augment the reliability of the system, we propose a coordinated checkpointing/recovery mechanism. Both mechanisms were implemented and incorporated in JIAJIA, a DSM system that implements scope consistency with a write-invalidate protocol. Our results on an eight machine cluster with some popular benchmarks show, for the multiple coherence protocol strategy, a significant reduction on the number of messages exchanged, leading to better performance results. Also, our results for the checkpointing strategy show that the overhead introduced in failure-free executions is small when considering the benefits obtained.
Keywords
checkpointing; distributed shared memory systems; message passing; protocols; workstation clusters; DSM systems; JIAJIA; checkpointing-recovery mechanism; computer cluster; coordinated checkpointing protocols; failure-free executions; messages exchange; multiple coherence protocol; reliability; write-invalidate protocol; Access protocols; Checkpointing; Coherence; Computer science; Fault tolerant systems; Information technology; Memory management; Parallel programming; Programming profession; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing, 2005. ICPP 2005 Workshops. International Conference Workshops on
ISSN
1530-2016
Print_ISBN
0-7695-2381-1
Type
conf
DOI
10.1109/ICPPW.2005.57
Filename
1488739
Link To Document