Title :
Excel-NUMA: toward programmability, simplicity, and high performance
Author :
Zhang, Zheng ; Cintra, Marcelo ; Torrellas, Josep
Author_Institution :
Hewlett-Packard Co., Palo Alto, CA, USA
fDate :
2/1/1999 12:00:00 AM
Abstract :
While hardware-coherent scalable shared-memory multiprocessors are relatively easy to program, they still require substantial programming effort to deliver high performance. Specifically, to minimize remote accesses, data must be carefully laid out in memory for locality and application working sets carefully tuned for caches. It has been claimed that this programming effort is less necessary in hardware COMA machines like Flat-COMA thanks to automatic line-based data migration. Unfortunately, Flat-COMA is complex to design. Consequently, we would like a machine as programmable as Flat-COMA, as simple as plain CC-NUMA, and that outperforms both. This paper presents our proposal: Excel-NUMA (EX-NUMA). The idea is to exploit the fact that, after a memory line is written and cached, the storage that kept the line in memory is unutilized. We use that storage to temporarily hold remote data displaced from the local caches. This enables automatic data migration, like in Flat-COMA, enhancing programmability. The hardware required to manage the system is a simple, local module added to a CC-NUMA; the global cache coherence protocol is not changed. Simulations of Splash2 applications show that EX-NUMA outperforms CC-NUMA and Flat-COMA in every single application and eliminates most of the conflict misses
Keywords :
performance evaluation; protocols; shared memory systems; EX-NUMA; Excel-NUMA; Splash2 applications; automatic line-based data migration; global cache coherence protocol; high performance; programmability; scalable shared-memory multiprocessors; Access protocols; Automatic programming; Cache storage; Computational modeling; Costs; Graphics; Hardware; Proposals; Random access memory; Silicon;
Journal_Title :
Computers, IEEE Transactions on