• DocumentCode
    1106994
  • Title

    Commercial fault tolerance: a tale of two systems

  • Author

    Bartlett, Wendy ; Spainhower, Lisa

  • Author_Institution
    Hewlett Packard, Cupertino, CA, USA
  • Volume
    1
  • Issue
    1
  • fYear
    2004
  • Firstpage
    87
  • Lastpage
    96
  • Abstract
    This paper compares and contrasts the design philosophies and implementations of two computer system families: the IBM S/360 and its evolution to the current zSeries line, and the Tandem (now HP) NonStop® Server. Both systems have a long history; the initial IBM S/360 machines were shipped in 1964, and the Tandem NonStop System was first shipped in 1976. They were aimed at similar markets, what would today be called enterprise-class applications. The requirement for the original S/360 line was for very high availability; the requirement for the NonStop platform was for single fault tolerance against unplanned outages. Since their initial shipments, availability expectations for both platforms have continued to rise and the system designers and developers have been challenged to keep up. There were and still are many similarities in the design philosophies of the two lines, including the use of redundant components and extensive error checking. The primary difference is that the S/360-zSeries focus has been on localized retry and restore to keep processors functioning as long as possible, while the NonStop developers have based systems on a loosely coupled multiprocessor design that supports a "fail-fast" philosophy implemented through a combination of hardware and software, with workload being actively taken over by another resource when one fails.
  • Keywords
    business data processing; data integrity; error handling; fault tolerant computing; hardware-software codesign; multiprocessing systems; system recovery; IBM S/360; Tandem NonStop® Server; commercial fault tolerance; computer system families; computer systems implementation; enterprise-class applications; error checking; high availability; zSeries line; Availability; Business; Computer Society; Delay; Fault tolerance; Fault tolerant systems; Hardware; History; Manufacturing; Stock markets; 65; Index Terms- Computer systems implementation; fault tolerance; high availability.;
  • fLanguage
    English
  • Journal_Title
    Dependable and Secure Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5971
  • Type

    jour

  • DOI
    10.1109/TDSC.2004.4
  • Filename
    1335469