• DocumentCode
    2283327
  • Title

    A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures

  • Author

    Fernández-Pascual, Ricardo ; García, José M. ; Acacio, Manuel E. ; Duato, Jose

  • Author_Institution
    Dpto. Ingenieria y Tecnologia de Computadores, Univ. de Murcia
  • fYear
    2007
  • fDate
    10-14 Feb. 2007
  • Firstpage
    157
  • Lastpage
    168
  • Abstract
    It is widely accepted that transient failures will appear more frequently in chips designed in the near future due to several factors such as the increased integration scale. On the other hand, chip-multiprocessors (CMP) that integrate several processor cores in a single chip are nowadays the best alternative to more efficient use of the increasing number of transistors that can be placed in a single die. Hence, it is necessary to design new techniques to deal with these faults to be able to build sufficiently reliable chip multiprocessors (CMPs). In this work, we present a coherence protocol aimed at dealing with transient failures that affect the interconnection network of a CMP, thus assuming that the network is no longer reliable. In particular, our proposal extends a token-based cache coherence protocol so that no data can be lost and no deadlock can occur due to any dropped message. Using GEMS full system simulator, we compare our proposal against a similar protocol without fault tolerance (TOKENCMP). We show that in absence of failures our proposal does not introduce overhead in terms of increased execution time over TOKENCMP. Additionally, our protocol can tolerate message loss rates much higher than those likely to be found in the real world without increasing execution time more than 15%
  • Keywords
    cache storage; fault tolerance; microprocessor chips; multiprocessing systems; parallel architectures; protocols; CMP architecture; GEMS full system simulator; TOKENCMP; chip-multiprocessors; fault tolerant coherence protocol; interconnection network; single chip; token-based cache coherence protocol; Computer architecture; Electromagnetic interference; Electromagnetic radiation; Electromagnetic transients; Electronic components; Energy consumption; Fault tolerance; Multiprocessor interconnection networks; Proposals; Protocols;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on
  • Conference_Location
    Scottsdale, AZ
  • Print_ISBN
    1-4244-0805-9
  • Electronic_ISBN
    1-4244-0805-9
  • Type

    conf

  • DOI
    10.1109/HPCA.2007.346194
  • Filename
    4147657