Title :
SHiFA: System-level hierarchy in run-time fault-aware management of many-core systems
Author :
Fattah, Mohammad ; Palesi, Maurizio ; Liljeberg, Pasi ; Plosila, Juha ; Tenhunen, Hannu
Author_Institution :
Univ. of Turku, Turku, Finland
Abstract :
A system-level approach to fault-aware resource management of many-core systems is proposed. The proposed approach, called SHiFA, is able to tolerate run-time faults at system level without any hardware overhead. In contrast to the existing system-level methods, network resources are also considered to be potentially faulty. Accordingly, applications are mapped onto healthy nodes of the system at run-time such that their interaction will not require the use of faulty elements. By utilizing the simple routing approach, results show 100% utilizability of PEs and 99.41% of successful mapping when up to 8 links are broken. SHiFA design is based on distributed operating systems, such that it is kept scalable for future many-core systems. A significant improvement in scalability properties is observed compared to the state-of-the-art distributed approaches.
Keywords :
fault tolerant computing; multiprocessing systems; operating systems (computers); resource allocation; SHiFA design; distributed operating systems; fault-aware resource management; many-core systems; network resources; routing approach; run-time fault-aware management; system-level hierarchy; Circuit faults; Fault tolerance; Fault tolerant systems; Kernel; Mobile communication; Resource management; Routing; application mapping; hierarchical management; system-level design;
Conference_Titel :
Design Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE
Conference_Location :
San Francisco, CA
DOI :
10.1145/2593069.2593214