• DocumentCode
    1270904
  • Title

    The EFTOS approach to dependability in embedded supercomputing

  • Author

    Deconinck, Geert ; De Florio, Vincenzo ; Varvarigou, Theodora A. ; Verentziotis, Evangelos

  • Author_Institution
    Dept. Elektrotechniek, Katholieke Univ., Leuven, Belgium
  • Volume
    51
  • Issue
    1
  • fYear
    2002
  • fDate
    3/1/2002 12:00:00 AM
  • Firstpage
    76
  • Lastpage
    90
  • Abstract
    Industrial embedded supercomputing applications benefit from a systematic approach to fault tolerance. The EFTOS (embedded fault-tolerant supercomputing) framework provides a flexible and adaptable set of fault-tolerance tools from which the application developer can choose to make an embedded application on a parallel or distributed system more dependable. A high-level description (recovery language) helps the developer specify the fault-tolerance strategies of the application as a second application layer; this separates functional from fault-tolerance aspects of an application, thus shortening the development cycle and improving maintainability. The framework incorporates a backbone (to hook a set of fault-tolerance tools onto, and to coordinate the fault-tolerance actions) and a presentation layer (to monitor and test the fault tolerance behavior). A practical implementation is described with its performance evaluation, using an industrial case study from the energy-transport area, as well as an analytic deduction of the appropriateness of fault-tolerance techniques for various application profiles
  • Keywords
    embedded systems; fault tolerant computing; parallel machines; EFTOS; development cycle; energy-transport area; fault tolerance; fault-tolerant communication; industrial embedded supercomputing applications; maintainability; performance evaluation; presentation layer; software-based fault tolerance; stable memory; Application software; Costs; Electrical equipment industry; Fault tolerance; Fault tolerant systems; Hardware; Monitoring; Redundancy; Spine; Testing;
  • fLanguage
    English
  • Journal_Title
    Reliability, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9529
  • Type

    jour

  • DOI
    10.1109/24.994916
  • Filename
    994916