• DocumentCode
    2571066
  • Title

    Experimental assessment of parallel systems

  • Author

    Silva, João Gabriel ; Carreira, João ; Madeira, Henrique ; Costa, Diamantino ; Moreira, Francisco

  • Author_Institution
    Dept. of Eng. Inf., Coimbra Univ., Portugal
  • fYear
    1996
  • fDate
    25-27 Jun 1996
  • Firstpage
    415
  • Lastpage
    424
  • Abstract
    In the research reported in this paper, transient faults were injected in the nodes and in the communication subsystem (by using software fault injection) of a commercial parallel machine running several real applications. The results showed that a significant percentage of faults caused the system to produce wrong results while the application seemed to terminate normally, thus demonstrating that fault tolerance techniques are required in parallel systems, not only to assure that long-running applications can terminate but also (and more important) that the results produced are correct. Of the techniques tested to reduce the percentage of undetected wrong results only ABFT proved to be effective. For other simple error detection methods to be effective, they have to be designed in, and not added as an after thought. Faults injected in the communication subsystem proved the effectiveness of end-to-end CRCs on the data movements between processors
  • Keywords
    parallel machines; parallel programming; program debugging; program testing; software fault tolerance; ABFT; commercial parallel machine; communication subsystem; error detection methods; long-running applications; parallel systems assessment; processor data movement; software fault injection; software fault tolerance; transient fault injection; Application software; Computer crashes; Cyclic redundancy check; Fault tolerant systems; Hardware; Parallel machines; Preventive maintenance; Software maintenance; Testing; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault Tolerant Computing, 1996., Proceedings of Annual Symposium on
  • Conference_Location
    Sendai
  • ISSN
    0731-3071
  • Print_ISBN
    0-8186-7262-5
  • Type

    conf

  • DOI
    10.1109/FTCS.1996.534627
  • Filename
    534627