Title :
Fault recovery protocol for distributed memory MPSoCs
Author :
Barreto, Francisco F. S. ; Amory, Alexandre M. ; Moraes, Fernando G.
Author_Institution :
FACIN, PUCRS, Porto Alegre, Brazil
Abstract :
Fault handling mechanisms become more relevant as systems integrate more hardware logic. For instance, current multi-processor system-on-chips (MPSoCs) consists of hundreds of processors connected by an interconnection network. This type of system can only be cost effective if it can handle faults on its main components (i.e. processors and interconnect). Traditional fault recovery approaches for multi-processors were adapted from the domain of cluster of computers and might be more complex than required for common MPSoC applications domains. This paper presents a lightweight online fault recovery for embedded processors of MPSoCs based on distributed memory. This approach automatically restarts affected applications reallocating tasks to healthy processors. All steps are performed at the kernel level, without changing user application code. Results show very short recovery time, from 110 μs to 425 μs with a 100MHz clock, which are mostly dominated by the size of the reallocated tasks.
Keywords :
embedded systems; multiprocessor interconnection networks; protocols; system-on-chip; distributed memory MPSoC; embedded processors; fault recovery protocol; interconnection network; multi-processor system-on-chips; Bandwidth; Context; Fault tolerance; Fault tolerant systems; Hardware; Program processors; Protocols; MPSoCs; distributed memory; fault recovery; many-cores; task migration;
Conference_Titel :
Circuits and Systems (ISCAS), 2015 IEEE International Symposium on
Conference_Location :
Lisbon
DOI :
10.1109/ISCAS.2015.7168660