DocumentCode :
2397327
Title :
Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration
Author :
Acacio, Manuel E. ; Gonzalez, José ; García, José M. ; Duato, José
Author_Institution :
Dpto. Ing. y Tecnologia de Computadores, Murcia Univ., Spain
fYear :
2002
fDate :
2002
Firstpage :
368
Lastpage :
375
Abstract :
Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller and the network interface. In this paper, we exploit such an integration scale, presenting a new three-level directory architecture aimed at reducing the long L2 miss latencies and the memory overhead that characterize cc-NUMA machines and limit their scalability. The proposed architecture is based on the integration into the processor chip of the directory controller and a small first-level directory cache that stores precise information for the most recently referenced memory lines, as the means to reduce miss latencies. The second- and third-level directories are located near the main memory and they are only accessed when a directory entry for a certain memory line is not present in the first-level directory. This off-chip structure achieves the performance of a large and non-scalable full-map directory with a very significant reduction in the memory overhead. Using execution-driven simulations, we show that substantial latency reductions can be obtained by using the proposed directory architecture. Load, store and read-modify-write misses are significantly accelerated (latency reductions of more than 35% in some cases). These reductions translate into important improvements on the final application performance (reductions up to 20% in execution time)
Keywords :
cache storage; delays; microprocessor chips; parallel architectures; performance evaluation; shared memory systems; 3-level directory architecture; L2 miss latency reduction; application performance; cache-coherent nonuniform memory access; cc-NUMA machines; directory cache; directory controller; execution time; execution-driven simulations; integration scale; load misses; main memory; memory controller; memory overhead reduction; network interface; off-chip structure; on-chip directory integration; performance; read-modify-write misses; recently referenced memory lines; scalability; shared-memory multiprocessors; store misses; technology improvements; Acceleration; Access protocols; Broadcasting; Computer networks; Delay; Hardware; Hip; Multiprocessor interconnection networks; Scalability; Sun;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel, Distributed and Network-based Processing, 2002. Proceedings. 10th Euromicro Workshop on
Conference_Location :
Canary Islands
Print_ISBN :
0-7695-1444-8
Type :
conf
DOI :
10.1109/EMPDP.2002.994312
Filename :
994312
Link To Document :
بازگشت