• DocumentCode
    2062067
  • Title

    Functional Tests of the RADIC Fault Tolerance Architecture

  • Author

    Duarte, Angelo ; Rexachs, Dolores ; Luque, Emilio

  • Author_Institution
    Comput. Archit. & Oper. Syst. Dept., Univ. Autonoma de Barcelona
  • fYear
    2007
  • fDate
    7-9 Feb. 2007
  • Firstpage
    278
  • Lastpage
    287
  • Abstract
    Clusters with thousand of nodes are a reality and the current trend indicates that they are becoming larger. Such large clusters are subject to a relatively high fault frequency so a fault-tolerance scheme is mandatory to assure the correct application completion. Message passing is the programming model often used in large clusters and the current implementations used to achieve fault tolerance in message passing systems do not focus in an architecture that simultaneously attends to scalability, transparency and independence of stable/central elements. The RADIC architecture was proposed and design as a fully distributed structure in order to achieve such requirements. Such architecture defines a fully distributed fault tolerance controller implemented by a set of system processes, which collaborate in order to perform all the basic functions of a fault tolerance protocol. This paper presents the test methodology used to verify the functionality of the RADIC architecture using RADICMPI, a prototype on the MPI semantic
  • Keywords
    fault tolerant computing; message passing; parallel architectures; RADIC fault tolerance architecture; RADICMPI; fully distributed structure; functional tests; message passing; programming model; Collaboration; Control systems; Distributed control; Fault tolerance; Fault tolerant systems; Frequency; Message passing; Protocols; Scalability; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing, 2007. PDP '07. 15th EUROMICRO International Conference on
  • Conference_Location
    Napoli
  • ISSN
    1066-6192
  • Print_ISBN
    0-7695-2784-1
  • Type

    conf

  • DOI
    10.1109/PDP.2007.45
  • Filename
    4135288