• DocumentCode
    3175521
  • Title

    Validation of the fault/error handling mechanisms of the Teraflops supercomputer

  • Author

    Constantinescu, C.

  • Author_Institution
    Server Archit. Lab., Intel Corp., Hillsboro, OR, USA
  • fYear
    1998
  • fDate
    23-25 June 1998
  • Firstpage
    382
  • Lastpage
    389
  • Abstract
    The Teraflops system, the world´s most powerful supercomputer, was developed by Intel Corporation for the US Department of Energy (DOE) as part of the Accelerated Strategic Computing Initiative (ASCI). The machine contains more than 9000 Intel Pentium (R) Pro processors and performs over one trillion floating point operations per second. Complex hardware and software mechanisms were devised for complying with DOE´s reliability requirements. This paper gives a brief description of the Teraflops system architecture and presents the validation of the fault/error handling mechanisms. The validation process was based on an enhanced version of the physical fault injection at the IC pin level. An original approach was developed for assessing signal sensitivity to transient faults and the effectiveness of the fault tolerance mechanisms. Several malfunctions were unveiled by the fault injection experiments. After corrective actions had been undertaken, the supercomputer performed according to the specification.
  • Keywords
    fault tolerant computing; parallel architectures; parallel machines; system recovery; Accelerated Strategic Computing Initiative; Intel Pentium; Teraflops supercomputer; Teraflops system architecture; corrective actions; fault injection experiments; fault tolerance mechanisms; fault/error handling mechanisms; transient faults; Acceleration; Computer architecture; Fault tolerance; Fault tolerant systems; Hardware; Laboratories; Software performance; Supercomputers; Tellurium; US Department of Energy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Computing, 1998. Digest of Papers. Twenty-Eighth Annual International Symposium on
  • Conference_Location
    Munich, Germany
  • ISSN
    0731-3071
  • Print_ISBN
    0-8186-8470-4
  • Type

    conf

  • DOI
    10.1109/FTCS.1998.689489
  • Filename
    689489