Title :
DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism
Author :
Choi, Byn ; Komuravelli, Rakesh ; Sung, Hyojin ; Smolinski, Robert ; Honarmand, Nima ; Adve, Sarita V. ; Adve, Vikram S. ; Carter, Nicholas P. ; Chou, Ching-Tsun
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
Abstract :
For parallelism to become tractable for mass programmers, shared-memory languages and environments must evolve to enforce disciplined practices that ban "wild shared-memory behaviors;\´\´ e.g., unstructured parallelism, arbitrary data races, and ubiquitous non-determinism. This software evolution is a rare opportunity for hardware designers to rethink hardware from the ground up to exploit opportunities exposed by such disciplined software models. Such a co-designed effort is more likely to achieve many-core scalability than a software-oblivious hardware evolution. This paper presents DeNovo, a hardware architecture motivated by these observations. We show how a disciplined parallel programming model greatly simplifies cache coherence and consistency, while enabling a more efficient communication and cache architecture. The DeNovo coherence protocol is simple because it eliminates transient states - verification using model checking shows 15X fewer reachable states than a state-of-the-art implementation of the conventional MESI protocol. The DeNovo protocol is also more extensible. Adding two sophisticated optimizations, flexible communication granularity and direct cache-to-cache transfers, did not introduce additional protocol states (unlike MESI). Finally, DeNovo shows better cache hit rates and network traffic, translating to better performance and energy. Overall, a disciplined shared-memory programming model allows DeNovo to seamlessly integrate message passing-like interactions within a global address space for improved design complexity, performance, and efficiency.
Keywords :
cache storage; message passing; parallel memories; parallel programming; protocols; shared memory systems; ubiquitous computing; DeNovo coherence protocol; MESI protocol; arbitrary data race; cache architecture; cache hit rate; design complexity; direct cache-to-cache transfer; disciplined parallelism; flexible communication granularity; hardware architecture; hardware designer; many-core scalability; mass programmer; memory hierarchy; message passing; network traffic; parallel programming model; shared-memory language; shared-memory programming model; software evolution; software-oblivious hardware evolution; Arrays; Coherence; Hardware; Programming; Protocols; Software; Transient analysis;
Conference_Titel :
Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
Conference_Location :
Galveston, TX
Print_ISBN :
978-1-4577-1794-9
DOI :
10.1109/PACT.2011.21