مرکز منطقه ای اطلاع رساني علوم و فناوري - Reconfiguration in octagonal mesh-based multicomputer systems with distributed checkpointing

DocumentCode :

3444221

Title :

Reconfiguration in octagonal mesh-based multicomputer systems with distributed checkpointing

Author :

Bauch, Andreas ; Maehle, Erik

Author_Institution :

Univ. Gesamthochschule Paderborn, Germany

fYear :

1994

fDate :

12-14 Jun 1994

Firstpage :

169

Lastpage :

180

Abstract :

In the field of large multicomputer systems fault tolerance is no longer negligible. For the implementation of fault tolerance in mesh-based systems dynamic redundancy is a suitable approach. One major problem is the reconfiguration of the interconnection network after a fault. This paper presents two reconfiguration schemes for octagonal mesh-based multicomputer systems that are closely related to the distributed checkpointing approach. One scheme is able to reconfigure a 2D-mesh as an application graph in an octagonal 2D-mesh as a machine graph after a single fault, provided the checkpoints are organized as a meander. This reconfiguration can be done with a dilation of 2 and a congestion of 2. The other algorithm reconfigures any application graph in an octagonal mesh as machine graph that was originally embedded with the congestion of 1 under the assumption that the checkpoints are organized as a spiral and only single faults occur. In this case the dilation is increased by a factor of 2 for a square mesh and by a factor of 4 for a rectangular one, while the congestion is 3 in both cases. Also, some practical experiences with a sample implementation of the first reconfiguration scheme and a scheme for more general application graphs on the DAMP multicomputer system are reported

Keywords :

fault tolerant computing; multiprocessor interconnection networks; reconfigurable architectures; 2D-mesh; application graph; distributed checkpointing; dynamic redundancy; fault tolerance; interconnection network; multicomputer systems; octagonal mesh-based; reconfiguration; Checkpointing; Distributed computing; Fault tolerance; Fault tolerant systems; Message passing; Multiprocessor interconnection networks; Redundancy; Spirals; Supercomputers; Very large scale integration;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Fault-Tolerant Parallel and Distributed Systems, 1994., Proceedings of IEEE Workshop on

Conference_Location :

College Station, TX

Print_ISBN :

0-8186-6807-5

Type :

conf

DOI :

10.1109/FTPDS.1994.494488

Filename :

494488

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3444221