• DocumentCode
    1332656
  • Title

    Fault-tolerant adaptive and minimal routing in mesh-connected multicomputers using extended safety levels

  • Author

    Wu, Jie

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
  • Volume
    11
  • Issue
    2
  • fYear
    2000
  • fDate
    2/1/2000 12:00:00 AM
  • Firstpage
    149
  • Lastpage
    159
  • Abstract
    The minimal routing problem in mesh-connected multicomputers with faulty blocks is studied. Two-dimensional meshes are used to illustrate the approach. A sufficient condition for minimal routing in 2D meshes with faulty blocks is proposed. Unlike many traditional models that assume all the nodes know global fault distribution, our approach is based on the concept of an extended safety level, which is a special form of limited fault information. The extended safety level information is captured by a vector associated with each node. When the safety level of a node reaches a certain level (or meets certain conditions), a minimal path exists from this node to any nonfaulty nodes in 2D meshes. Specifically, we study the existence of minimal paths at a given source node, limited distribution of fault information, and minimal routing itself. We propose three fault-tolerant minimal routing algorithms which are adaptive to allow all messages to use any minimal path. We also provide some general ideas to extend our approaches to other low-dimensional mesh-connected multicomputers such as 2D tori and 3D meshes. Our approach is the first attempt to address adaptive and minimal routing in 2D meshes with faulty blocks using limited fault information
  • Keywords
    fault tolerant computing; hypercube networks; network routing; 2D tori; 3D meshes; adaptive routing; extended safety levels; fault-tolerant adaptive routing; fault-tolerant minimal routing algorithms; mesh-connected multicomputers; minimal paths; minimal routing; minimal routing problem; sufficient condition; two-dimensional meshes; Communication networks; Communication switching; Costs; Fault tolerance; Hypercubes; Network topology; Routing; Safety; Sufficient conditions; Telecommunication network reliability;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/71.841751
  • Filename
    841751