• DocumentCode
    2174848
  • Title

    Deadlock Avoidance for Interconnection Networks with Multiple Dynamic Faults

  • Author

    Zarza, Gonzalo ; Lugones, Diego ; Franco, Daniel ; Luque, Emilio

  • Author_Institution
    Comput. Archit. & Oper. Syst. Dept., Univ. Autonoma of Barcelona, Barcelona, Spain
  • fYear
    2010
  • fDate
    17-19 Feb. 2010
  • Firstpage
    276
  • Lastpage
    280
  • Abstract
    The intensive and continuous use of high-performance computing systems for executing computationally intensive applications, coupled with the large number of elements that make them up, dramatically increase the likelihood of failures during their operation. Clearly, network faults have an extremely high impact because most routing algorithms are not designed to tolerate faults. In such algorithms, just a single fault may lead to deadlocked configurations thus preventing the correct finalization of applications. This paper introduces a new deadlock avoidance mechanism for routing algorithms designed to deal with multiple dynamic faults. The mechanism is based on adding a small-sized buffer and applying a simple set of actions when accessing output buffers with limited free space. Unlike typical static solutions, this proposal allows the design of routing algorithms capable of treating an unbounded number of dynamic faults.
  • Keywords
    fault tolerant computing; multiprocessor interconnection networks; system recovery; deadlock avoidance mechanism; deadlocked configurations; failure likelihood; high performance computing systems; interconnection networks; multiple dynamic faults; routing algorithms; small sized buffer; Algorithm design and analysis; Computer applications; Computer architecture; Computer networks; Fault tolerance; Fault tolerant systems; Multiprocessor interconnection networks; Power system modeling; Routing; System recovery; adaptive routing; deadlock avoidance; interconnection networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1066-6192
  • Print_ISBN
    978-1-4244-5672-7
  • Electronic_ISBN
    1066-6192
  • Type

    conf

  • DOI
    10.1109/PDP.2010.82
  • Filename
    5452458