Title :
Managing multi-core soft-error reliability through utility-driven cross domain optimization
Author :
Zhang, Wangyuan ; Li, Tao
Author_Institution :
Dept. of Electr. & Comput. Eng., Florida Univ., Gainesville, FL
Abstract :
As semiconductor processing technology continues to scale down, managing reliability becomes an increasingly difficult challenge in high-performance microprocessor design. Transient faults, also known as soft errors, corrupt program data at the circuit level and cause incorrect program execution and system crashes. Future processors will consist of billions of transistors organized as multicore microarchitectures. Packaging multiple cores (and hence more transistors) onto the same die exposes more devices to soft error strikes. This paper explores utility-function-driven (benefit driven) cross domain optimization for both performance and reliability. We propose the use of utility-based resource management for individual cores while applying utility-based shared cache partitioning across multiple cores. Moreover, we coordinate the optimization of multiple resources based on their cross domain utility information to achieve attractive performance and reliability tradeoffs. Extensive experimental results show that, on average, our utility-driven cross domain optimization reduces the soft error rate of the most vulnerable core in a chip multiprocessor (CMP) by up to 35% and improves the CMPpsilas overall reliability by 22% with less than 3% performance degradation across 15 investigated workloads.
Keywords :
circuit optimisation; error statistics; integrated circuit reliability; microprocessor chips; chip multiprocessor; high-performance microprocessor design; multi-core microarchitectures; multi-core soft-error reliability; semiconductor processing technology; soft error rate; soft error strikes; transient faults; utility-based resource management; utility-based shared cache partitioning; utility-driven cross domain optimization; utility-function-driven cross domain optimization; Circuit faults; Computer crashes; Error analysis; Microarchitecture; Microprocessors; Multicore processing; Packaging; Resource management; Semiconductor device reliability; Technology management;
Conference_Titel :
Application-Specific Systems, Architectures and Processors, 2008. ASAP 2008. International Conference on
Conference_Location :
Leuven
Print_ISBN :
978-1-4244-1897-8
Electronic_ISBN :
2160-0511
DOI :
10.1109/ASAP.2008.4580167