DocumentCode :
1925481
Title :
A symmetric O(n log n) message distributed snapshot algorithm for large-scale systems
Author :
Kshemkalyani, Ajay D.
Author_Institution :
Comput. Sci. Dept., Univ. of Illinois at Chicago, Chicago, IL, USA
fYear :
2009
fDate :
Aug. 31 2009-Sept. 4 2009
Firstpage :
1
Lastpage :
4
Abstract :
This paper presents a O(n log n) message distributed snapshot algorithm for a system with non-FIFO channels, where n is the number of processors. The algorithm finds applications for checkpointing in large scale supercomputers and distributed systems that have a fully connected logical topology over a large number of processors. Each processor sends log n messages in the algorithm. The sizes of the messages are geometrically distributed, and the sum of the sizes of the messages sent by any processor is n. The response time of the algorithm is O(log n). The algorithm is fully distributed and the role of each processor is symmetric, unlike tree-based, ring-based, and centralized algorithms.
Keywords :
checkpointing; computational complexity; distributed processing; large-scale systems; mainframes; checkpointing; large scale supercomputers; large-scale systems; message distributed snapshot; non-FIFO channels; Application software; Checkpointing; Computer science; Costs; Delay; Hypercubes; Large-scale systems; Supercomputers; Topology; Tree graphs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Conference_Location :
New Orleans, LA
ISSN :
1552-5244
Print_ISBN :
978-1-4244-5011-4
Electronic_ISBN :
1552-5244
Type :
conf
DOI :
10.1109/CLUSTR.2009.5289139
Filename :
5289139
Link To Document :
بازگشت