DocumentCode
568637
Title
Transient Fault Tolerance for ccNUMA Architecture
Author
Xingjun Zhang ; Endong Wang ; Feilong Tang ; Meishun Yang ; Hengyi Wei ; Xiaoshe Dong
Author_Institution
Dept. of Comput. Sci. & Technol., Xi´an Jiaotong Univ., Xi´an, China
fYear
2012
fDate
4-6 July 2012
Firstpage
197
Lastpage
202
Abstract
Transient fault is a critical concern in the reliability of microprocessors system. The software fault tolerance is more flexible and lower cost than the hardware fault tolerance. And also, as architectural trends point toward multi core designs, there is substantial interest in adapting parallel and redundancy hardware resources for transient fault tolerance. The paper proposes a process-level fault tolerance technique, a software centric approach, which efficiently schedule and synchronize of redundancy processes with ccNUMA processors redundancy. So it can improve efficiency of redundancy processes running, and reduce time and space overhead. The paper focuses on the researching of redundancy processes error detection and handling method. A real prototype is implemented that is designed to be transparent to the application. The test results show that the system can timely detect soft errors of CPU and memory that cause the redundancy processes exception, and meanwhile ensure that the services of application is uninterrupted and delay shortly.
Keywords
delays; error detection; error handling; memory architecture; multiprocessing systems; processor scheduling; redundancy; software fault tolerance; synchronisation; CPU; ccNUMA architecture; ccNUMA processor redundancy; delay; error detection method; error handling method; microprocessor system; multicore design; parallel resource; process level fault tolerance; processor scheduling; prototype; reliability; soft error detection; software centric approach; synchronization; transient fault tolerance; Fault tolerant systems; Hardware; Kernel; Redundancy; Synchronization; Transient analysis; Transient fault; ccNUMA; dual-process;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 Sixth International Conference on
Conference_Location
Palermo
Print_ISBN
978-1-4673-1328-5
Type
conf
DOI
10.1109/IMIS.2012.188
Filename
6296854
Link To Document