DocumentCode
2397327
Title
Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration
Author
Acacio, Manuel E. ; Gonzalez, José ; García, José M. ; Duato, José
Author_Institution
Dpto. Ing. y Tecnologia de Computadores, Murcia Univ., Spain
fYear
2002
fDate
2002
Firstpage
368
Lastpage
375
Abstract
Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller and the network interface. In this paper, we exploit such an integration scale, presenting a new three-level directory architecture aimed at reducing the long L2 miss latencies and the memory overhead that characterize cc-NUMA machines and limit their scalability. The proposed architecture is based on the integration into the processor chip of the directory controller and a small first-level directory cache that stores precise information for the most recently referenced memory lines, as the means to reduce miss latencies. The second- and third-level directories are located near the main memory and they are only accessed when a directory entry for a certain memory line is not present in the first-level directory. This off-chip structure achieves the performance of a large and non-scalable full-map directory with a very significant reduction in the memory overhead. Using execution-driven simulations, we show that substantial latency reductions can be obtained by using the proposed directory architecture. Load, store and read-modify-write misses are significantly accelerated (latency reductions of more than 35% in some cases). These reductions translate into important improvements on the final application performance (reductions up to 20% in execution time)
Keywords
cache storage; delays; microprocessor chips; parallel architectures; performance evaluation; shared memory systems; 3-level directory architecture; L2 miss latency reduction; application performance; cache-coherent nonuniform memory access; cc-NUMA machines; directory cache; directory controller; execution time; execution-driven simulations; integration scale; load misses; main memory; memory controller; memory overhead reduction; network interface; off-chip structure; on-chip directory integration; performance; read-modify-write misses; recently referenced memory lines; scalability; shared-memory multiprocessors; store misses; technology improvements; Acceleration; Access protocols; Broadcasting; Computer networks; Delay; Hardware; Hip; Multiprocessor interconnection networks; Scalability; Sun;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel, Distributed and Network-based Processing, 2002. Proceedings. 10th Euromicro Workshop on
Conference_Location
Canary Islands
Print_ISBN
0-7695-1444-8
Type
conf
DOI
10.1109/EMPDP.2002.994312
Filename
994312
Link To Document