DocumentCode
1106994
Title
Commercial fault tolerance: a tale of two systems
Author
Bartlett, Wendy ; Spainhower, Lisa
Author_Institution
Hewlett Packard, Cupertino, CA, USA
Volume
1
Issue
1
fYear
2004
Firstpage
87
Lastpage
96
Abstract
This paper compares and contrasts the design philosophies and implementations of two computer system families: the IBM S/360 and its evolution to the current zSeries line, and the Tandem (now HP) NonStop® Server. Both systems have a long history; the initial IBM S/360 machines were shipped in 1964, and the Tandem NonStop System was first shipped in 1976. They were aimed at similar markets, what would today be called enterprise-class applications. The requirement for the original S/360 line was for very high availability; the requirement for the NonStop platform was for single fault tolerance against unplanned outages. Since their initial shipments, availability expectations for both platforms have continued to rise and the system designers and developers have been challenged to keep up. There were and still are many similarities in the design philosophies of the two lines, including the use of redundant components and extensive error checking. The primary difference is that the S/360-zSeries focus has been on localized retry and restore to keep processors functioning as long as possible, while the NonStop developers have based systems on a loosely coupled multiprocessor design that supports a "fail-fast" philosophy implemented through a combination of hardware and software, with workload being actively taken over by another resource when one fails.
Keywords
business data processing; data integrity; error handling; fault tolerant computing; hardware-software codesign; multiprocessing systems; system recovery; IBM S/360; Tandem NonStop® Server; commercial fault tolerance; computer system families; computer systems implementation; enterprise-class applications; error checking; high availability; zSeries line; Availability; Business; Computer Society; Delay; Fault tolerance; Fault tolerant systems; Hardware; History; Manufacturing; Stock markets; 65; Index Terms- Computer systems implementation; fault tolerance; high availability.;
fLanguage
English
Journal_Title
Dependable and Secure Computing, IEEE Transactions on
Publisher
ieee
ISSN
1545-5971
Type
jour
DOI
10.1109/TDSC.2004.4
Filename
1335469
Link To Document