DocumentCode :
3359250
Title :
Cobra: A comprehensive bundle-based reliable architecture
Author :
Pellegrini, Alessandro ; Bertacco, Valeria
Author_Institution :
Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
fYear :
2013
fDate :
15-18 July 2013
Firstpage :
247
Lastpage :
254
Abstract :
The declining robustness of transistors and their ever-denser integration threatens the dependability of future microprocessors. Classic multicores offer a simple solution to overcome hardware defects: faulty processors can be disabled without affecting the rest of the system. However, this approach becomes quickly an impractical solution at high fault rates. Recently, distributed computer architectures have been proposed to mitigate the effects of faulty transistors by utilizing finegrained hardware reconfiguration, managed by fully decoupled control logic. Unfortunately, such solutions trade flexibility for execution locality, and thus they do not scale to large systems. To overcome this issue we propose Cobra, a distributed, scalable, highly parallel reliable architecture. Cobra is a service-based architecture where groups of dynamic instructions flow independently through the system, making use of the available hardware resources. Cobra organizes the system´s units dynamically using a novel resource assignment that preserves locality and limits communication overhead. Our experiments show that Cobra is extremely dependable, and outperforms classic multicores when subjected to 5 or more defects per 100 million transistors. We also show that Cobra operates 80% faster than previous state-of-the-art solutions on multi-programmed SPEC CPU2006 workloads and it improves cache hit rate by up to 62%. Our runtime fault detection techniques have a performance impact of only 3%.
Keywords :
multiprocessing systems; parallel architectures; Cobra; comprehensive bundle-based reliable architecture; computer architectures; faulty processors; faulty transistors; finegrained hardware reconfiguration; fully decoupled control logic; multiprogrammed SPEC CPU2006 workloads; runtime fault detection techniques; service-based architecture; Built-in self-test; Computer architecture; Fault detection; Hardware; Reliability; Scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII), 2013 International Conference on
Conference_Location :
Agios Konstantinos
Type :
conf
DOI :
10.1109/SAMOS.2013.6621131
Filename :
6621131
Link To Document :
بازگشت