• DocumentCode
    2485011
  • Title

    Fault tolerance for an embedded wormhole switched network

  • Author

    Hotchkiss, R. ; O´Neill, B.C. ; Clark, S.

  • Author_Institution
    Dept. of Electr. & Electron. Eng., Nottingham Trent Univ., UK
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    79
  • Lastpage
    83
  • Abstract
    The effectiveness of parallel and distributed systems depends heavily upon the reliability and efficiency of the method used for information transfer. To satisfy these requirements, the communication medium must supply fault tolerance throughout the communication layers, but should minimise operational overheads. The work described relates to a scalable communication system for a distributed-memory parallel processing architecture, which is constructed with message routing switches. The system employs a hardware mechanism that is local to each physical connection, which provides a distributed solution for fault detection and isolation. By isolating faults and the use of adaptive routing algorithms, networks may be designed that will maintain operability in the presence of faults. An explanation of the basic switch and fault isolation mechanism is provided. The paper concludes with implementation details of the operational hardware and details of the environment, in which it has been tested
  • Keywords
    distributed memory systems; fault tolerant computing; message passing; multiprocessor interconnection networks; network routing; parallel architectures; adaptive routing algorithms; distributed systems; distributed-memory parallel processing architecture; embedded wormhole switched network; fault detection; fault tolerance; information transfer; message routing switches; operational overhead; parallel systems; scalable communication system; Adaptive systems; Algorithm design and analysis; Communication switching; Fault detection; Fault tolerance; Hardware; Parallel processing; Routing; Switches; Telecommunication network reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Computing in Electrical Engineering, 2000. PARELEC 2000. Proceedings. International Conference on
  • Conference_Location
    Trois-Rivieres, Que.
  • Print_ISBN
    0-7695-0759-X
  • Type

    conf

  • DOI
    10.1109/PCEE.2000.873606
  • Filename
    873606