DocumentCode :
177308
Title :
Fractal++: Closing the performance gap between fractal and conventional coherence
Author :
Voskuilen, Gwendolyn ; Vijaykumar, T.N.
Author_Institution :
Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
fYear :
2014
fDate :
14-18 June 2014
Firstpage :
409
Lastpage :
420
Abstract :
Cache coherence protocol bugs can cause multicores to fail. Existing coherence verification approaches incur state explosion at small scales or require considerable human effort. As protocols´ complexity and multicores´ core counts increase, verification continues to be a challenge. Recently, researchers proposed fractal coherence which achieves scalable verification by enforcing observational equivalence between sub-systems in the coherence protocol. A larger sub-system is verified implicitly if a smaller sub-system has been verified. Unfortunately, fractal protocols suffer from two fundamental limitations: (1) indirect-communication: sub-systems cannot directly communicate and (2) partially-serial-invalidations: cores must be invalidated in a specific, serial order. These limitations disallow common performance optimizations used by conventional directory protocols: reply-forwarding where caches communicate directly and parallel invalidations. Therefore, fractal protocols lack performance scalability while directory protocols lack verification scalability. To enable both performance and verification scalability, we propose Fractal++ which employs a new class of protocol optimizations for verification-constrained architectures: decoupled-replies, contention-hints, and fully-parallel-fractal-invalidations. The first two optimizations allow reply-forwarding-like performance while the third optimization enables parallel invalidations in fractal protocols. Unlike conventional protocols, Fractal++ preserves observational equivalence and hence is scalably verifiable. In 32-core simulations of single- and four-socket systems, Fractal++ performs nearly as well as a directory protocol while providing scalable verifiability whereas the best-performing previous fractal protocol performs 8% on average and up to 26% worse with a single-socket and 12% on average and up to 34% worse with a longer-latency multi-socket system.
Keywords :
cache storage; formal verification; parallel processing; 32-core simulations; Fractal++; cache coherence protocol bugs; coherence verification approaches; contention-hints; decoupled-replies; directory protocols; four-socket system; fractal coherence; fractal protocols; fully-parallel-fractal-invalidations; indirect-communication; longer-latency multisocket system; multicores; observational equivalence; parallel invalidations; partially-serial-invalidations; performance gap; performance optimizations; performance scalability; protocol optimizations; reply-forwarding; single-socket system; state explosion; verification scalability; verification-constrained architectures; Coherence; Erbium; Fractals; Multicore processing; Optimization; Protocols; Scalability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4799-4396-8
Type :
conf
DOI :
10.1109/ISCA.2014.6853211
Filename :
6853211
Link To Document :
بازگشت