DocumentCode
2173008
Title
Improving yield and reliability of chip multiprocessors
Author
Pan, Abhisek ; Khan, Omer ; Kundu, Sandip
Author_Institution
Univ. of Massachusetts, Amherst, MA
fYear
2009
fDate
20-24 April 2009
Firstpage
490
Lastpage
495
Abstract
An increasing number of hardware failures can be attributed to device reliability problems that cause partial system failure or shutdown. In this paper we propose a scheme for improving reliability of a homogeneous chip multiprocessor (CMP) that also serves to improve manufacturing yield. Our solution centers on exploiting the natural redundancy that already exists in multi-core systems by using services from other cores for functional units that are defective in a faulty core. A micro-architectural modification allows a core on a CMP to use another core as a coprocessor to service any instruction that the former cannot execute correctly. This service is accessed to improve yield and reliability, but at the cost of some loss of performance. In order to quantify this loss we have used a cycle-accurate simulator to simulate the performance of a dual-core system with one or two cores sustaining partial failure. Our results indicate that when a large and sparingly-used unit such as a floating point arithmetic unit fails in a core, even for a floating point intensive benchmark, we can continue to run each faulty core with help from companion cores with as little as 10% impact to performance and less than 1% area overhead.
Keywords
floating point arithmetic; integrated circuit reliability; integrated circuit yield; microprocessor chips; multiprocessing systems; cycle-accurate simulator; device reliability problem; dual-core system; faulty core; floating point arithmetic unit; hardware failure; homogeneous chip multiprocessor; manufacturing yield; microarchitectural modification; multicore system; natural redundancy; partial system failure; partial system shutdown; Costs; Frequency; Hardware; Niobium compounds; Performance loss; Redundancy; Stress; Temperature; Titanium compounds; Voltage; micorarchitecture; multiprocessors; reliability; yield;
fLanguage
English
Publisher
ieee
Conference_Titel
Design, Automation & Test in Europe Conference & Exhibition, 2009. DATE '09.
Conference_Location
Nice
ISSN
1530-1591
Print_ISBN
978-1-4244-3781-8
Type
conf
DOI
10.1109/DATE.2009.5090714
Filename
5090714
Link To Document