Title :
Demonstrating HW–SW Transient Error Mitigation on the Single-Chip Cloud Computer Data Plane
Author :
Rodopoulos, Dimitrios ; Papanikolaou, Antonis ; Catthoor, Francky ; Soudris, Dimitrios
Author_Institution :
Micro Lab., Nat. Tech. Univ. of Athens, Athens, Greece
Abstract :
Transient errors are a major concern for the correct operation of low-level cache memories. Aggressive integration requires effective mitigation of such errors, without extreme overheads in power, timing, or silicon area. We demonstrate a hybrid (hardware-software) scheme that mitigates bit flips in data that reside in low-level caches. The methodology is shown to be applicable in streaming applications and we illustrate that with a video decoding case study on a state-of-the-art many-core chip. The single-chip cloud computer is an experimental processor created by Intel Labs. Dedicated on-chip memories are utilized to keep safe copies for key application data, thus allowing rollbacks upon error detection. The experimental results illustrate the tradeoff between application delay, consumed energy, and output fidelity as the injected errors are corrected. When output fidelity is considered as a hard constraint, application slack used for mitigation can be reclaimed with dynamic frequency scaling. Output fidelity is guaranteed regardless of the error injection intensity and the application´s timing constraints are respected up to a certain upper bound of error injection.
Keywords :
cache storage; cloud computing; error detection; microprocessor chips; video coding; HW-SW transient error mitigation; Intel Labs; application timing constraints; dynamic frequency scaling; error detection; hybrid hardware-software scheme; low-level cache memories; many-core chip; on-chip memories; single-chip cloud computer data plane; video decoding; Cache storage; Decoding; Error analysis; Transform coding; Transient analysis; Dynamic frequency scaling (DFS); Joint Photographic Experts Group (JPEG) format; Motion JPEG (MJPEG); single-chip cloud computer (SCC); transient errors; transient errors.;
Journal_Title :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
DOI :
10.1109/TVLSI.2014.2309663