DocumentCode :
3315250
Title :
Fault recovery protocol for distributed memory MPSoCs
Author :
Barreto, Francisco F. S. ; Amory, Alexandre M. ; Moraes, Fernando G.
Author_Institution :
FACIN, PUCRS, Porto Alegre, Brazil
fYear :
2015
fDate :
24-27 May 2015
Firstpage :
421
Lastpage :
424
Abstract :
Fault handling mechanisms become more relevant as systems integrate more hardware logic. For instance, current multi-processor system-on-chips (MPSoCs) consists of hundreds of processors connected by an interconnection network. This type of system can only be cost effective if it can handle faults on its main components (i.e. processors and interconnect). Traditional fault recovery approaches for multi-processors were adapted from the domain of cluster of computers and might be more complex than required for common MPSoC applications domains. This paper presents a lightweight online fault recovery for embedded processors of MPSoCs based on distributed memory. This approach automatically restarts affected applications reallocating tasks to healthy processors. All steps are performed at the kernel level, without changing user application code. Results show very short recovery time, from 110 μs to 425 μs with a 100MHz clock, which are mostly dominated by the size of the reallocated tasks.
Keywords :
embedded systems; multiprocessor interconnection networks; protocols; system-on-chip; distributed memory MPSoC; embedded processors; fault recovery protocol; interconnection network; multi-processor system-on-chips; Bandwidth; Context; Fault tolerance; Fault tolerant systems; Hardware; Program processors; Protocols; MPSoCs; distributed memory; fault recovery; many-cores; task migration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Circuits and Systems (ISCAS), 2015 IEEE International Symposium on
Conference_Location :
Lisbon
Type :
conf
DOI :
10.1109/ISCAS.2015.7168660
Filename :
7168660
Link To Document :
بازگشت