Title :
Error Adaptive Classifier Boosting (EACB): Leveraging Data-Driven Training Towards Hardware Resilience for Signal Inference
Author :
Zhuo Wang ; Schapire, Robert E. ; Verma, Naveen
Author_Institution :
Dept. of Electr. Eng., Princeton Univ., Princeton, NJ, USA
Abstract :
The continued scaling of CMOS technologies and consideration of post-CMOS technologies has elevated hardware reliability to a first-class challenge, particularly in energy- and resource-constrained embedded sensor applications. In such applications, there is an increasing emphasis on inference functions. Machine-learning algorithms play an important role by enabling the construction of data-driven models for inference over data that is too complex to model analytically. This paper explores how data-driven training can be exploited to also overcome computational errors due to hardware faults within an inference stage. FPGA emulation with randomized fault injections shows that the proposed architecture restores system performance to the level of a fault free system, with 1% of the hardware requiring explicit fault protection, and with digital faults affecting >2% of the circuit nodes in the rest of the hardware. To train an error-aware inference model, a training algorithm is presented whose hardware (memory) and energy requirements are reduced by 65 × and 10 × compared to previously reported algorithms (AdaBoost and FilterBoost respectively), thereby enabling model construction entirely on the device.
Keywords :
CMOS integrated circuits; error analysis; fault diagnosis; field programmable gate arrays; inference mechanisms; integrated circuit reliability; learning (artificial intelligence); pattern classification; CMOS technology scaling; EACB; FPGA emulation; circuit node; complementary metal oxide semiconductor technology; computational error; data-driven training; digital fault; error adaptive classifier boosting; error-aware inference model; fault free system; fault protection; field programmable gate array; hardware fault; hardware reliability; hardware resilience; inference function; machine-learning algorithm; randomized fault injection; signal inference; Boosting; Brain models; Circuit faults; Hardware; Memory management; Training; Circuit reliability; fault tolerance; pattern classification; pattern recognition;
Journal_Title :
Circuits and Systems I: Regular Papers, IEEE Transactions on
DOI :
10.1109/TCSI.2015.2395591