Title :
Handling Soft Error in Embedded Software for Networking System
Author :
Zhu, Haihong Henry
Author_Institution :
Cisco Syst., San Jose, CA, USA
Abstract :
Single event upset (SEU) is a well known and documented phenomenon that affects electronic circuitry. These events are caused by either atmospheric neutrons or alpha particles emitted by trace impurities in the silicon processing and packaging materials. The error in device output or operation caused by SEU is called soft error. Soft error is not software defects, but instead refers to a hardware data corruption that does not involve permanent chip damage. Soft errors can lead to catastrophic failures for embedded system. Due to the nature of the soft error, it is almost impossible to prevent them. Based on the impact severity, the recommended handling is to detect and correct them, called mitigation methodologies. The mitigation strategies are implemented in embedded software for networking system. This paper presents a comprehensive framework for single-event upset (SEU) mitigation methodologies for networking system. To achieve this goal we start by defining the SEU mitigation strategy as a combination of chip level methods and system level handling methods. Given a particular SEU chip level or system level mitigation choice, we propose first categorizing the SEU Failure In Time (FIT) into different time window bins based on SEU recovery time. Then we analyze the impact of each mitigation strategy, results in the FIT value change in each bin. This framework enables the engineers to do the SEU mitigation design in early product development phase. A user-friendly Excel tool is also developed to make the complicated model easy to use. The embedded system like networking device can be modeled using the tool at an early stage to support design decisions and trade-offs related to potentially costly implementation.
Keywords :
electronic engineering computing; electronics packaging; radiation hardening (electronics); reliability; SEU failure in time; SEU mitigation design; alpha particle; atmospheric neutron; chip level method; electronic circuitry; embedded software; hardware data corruption; networking system; packaging material; silicon processing; single event upset; soft error; system level handling method; trace impurities; user-friendly Excel tool; Error analysis; Error correction codes; Field programmable gate arrays; Monitoring; Random access memory; Reliability; Single event upsets; Reliability; SEU; embedded software; soft error;
Conference_Titel :
Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on
Conference_Location :
Naples
DOI :
10.1109/ISSREW.2014.109