• DocumentCode
    1786872
  • Title

    SHiFA: System-level hierarchy in run-time fault-aware management of many-core systems

  • Author

    Fattah, Mohammad ; Palesi, Maurizio ; Liljeberg, Pasi ; Plosila, Juha ; Tenhunen, Hannu

  • Author_Institution
    Univ. of Turku, Turku, Finland
  • fYear
    2014
  • fDate
    1-5 June 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    A system-level approach to fault-aware resource management of many-core systems is proposed. The proposed approach, called SHiFA, is able to tolerate run-time faults at system level without any hardware overhead. In contrast to the existing system-level methods, network resources are also considered to be potentially faulty. Accordingly, applications are mapped onto healthy nodes of the system at run-time such that their interaction will not require the use of faulty elements. By utilizing the simple routing approach, results show 100% utilizability of PEs and 99.41% of successful mapping when up to 8 links are broken. SHiFA design is based on distributed operating systems, such that it is kept scalable for future many-core systems. A significant improvement in scalability properties is observed compared to the state-of-the-art distributed approaches.
  • Keywords
    fault tolerant computing; multiprocessing systems; operating systems (computers); resource allocation; SHiFA design; distributed operating systems; fault-aware resource management; many-core systems; network resources; routing approach; run-time fault-aware management; system-level hierarchy; Circuit faults; Fault tolerance; Fault tolerant systems; Kernel; Mobile communication; Resource management; Routing; application mapping; hierarchical management; system-level design;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE
  • Conference_Location
    San Francisco, CA
  • Type

    conf

  • DOI
    10.1145/2593069.2593214
  • Filename
    6881428