• DocumentCode
    3053361
  • Title

    Algorithm-based fault tolerance for many-core architectures

  • Author

    Braun, Claus ; Wunderlich, Hans-Joachim

  • Author_Institution
    Inst. of Comput. Archit. & Comput. Eng., Univ. of Stuttgart, Stuttgart, Germany
  • fYear
    2010
  • fDate
    24-28 May 2010
  • Firstpage
    253
  • Lastpage
    253
  • Abstract
    Modern many-core architectures with hundreds of cores provide a high computational potential. This makes them particularly interesting for scientific high-performance computing and simulation technology. Like all nano scaled semiconductor devices, many-core processors are prone to reliability harming factors like variations and soft errors. One way to improve the reliability of such systems is software-based hardware fault tolerance. Here, the software is able to detect and correct errors introduced by the hardware. In this work, we propose a software-based approach to improve the reliability of matrix operations on many-core processors. These operations are key components in many scientific applications.
  • Keywords
    multiprocessing systems; parallel architectures; software fault tolerance; high performance scientific computing; high performance scientific simulation; many-core architectures; many-core processors; matrix operation reliability; nanoscaled semiconductor devices; software based hardware fault tolerance; Computational modeling; Computer architecture; Encoding; Error correction; Fault tolerance; Fault tolerant systems; Hardware; Reliability engineering; Semiconductor devices; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Test Symposium (ETS), 2010 15th IEEE European
  • Conference_Location
    Praha
  • ISSN
    1530-1877
  • Print_ISBN
    978-1-4244-5834-9
  • Electronic_ISBN
    1530-1877
  • Type

    conf

  • DOI
    10.1109/ETSYM.2010.5512738
  • Filename
    5512738