Title :
Using virtual load/store queues (VLSQs) to reduce the negative effects of reordered memory instructions
Author :
Jaleel, Aamer ; Jacob, Bruce
Author_Institution :
Dept. of Electr. & Comput. Eng., Maryland Univ., College Park, MD, USA
Abstract :
The use of large instruction windows coupled with aggressive out-of-order and prefetching capabilities has provided significant improvements in processor performance. In this paper, we quantify the effects of increased out-of-order aggressiveness on a processor´s memory ordering/consistency model as well as an application´s cache behavior. We observe that increasing reorder buffer sizes cause less than one third of issued memory instructions to be executed in actual program order. We show that increasing the reorder buffer size from 80 to 512 entries results in an increase in the frequency of memory traps by a factor of six and an increase in total execution overhead by 10-40%. Additionally, we observe that the reordering of memory instructions increases the L1 data cache accesses by 10-60% and the L1 data cache misses by 10-20%. These findings reveal that increased out-of-order capability can waste energy in two ways. First, re-fetching and re-executing instructions flushed due to traps require the fetch, map, and execution units to dissipate energy on work that has already been done before. Second, an increase in the number of cache accesses and cache misses needlessly dissipates energy. Both these side effects can be related to the reordering of memory instructions. Thus, to avoid wasting both energy and performance, we propose a virtual load/store queue (VLSQ) within the existing physical load/store queue. The VLSQ reduces the reordering of memory instructions by limiting the number of memory instructions visible to the select and issue logic. We show that VLSQs can reduce trap overhead, cache accesses, and cache misses by as much as 45%, 50%, and 15% respectively when compared to traditional load/store queues. We observe that these reductions yield net power savings of 10-50% with degradation in performance by 1-5%.
Keywords :
cache storage; computer power supplies; instruction sets; memory architecture; queueing theory; virtual storage; aggressive prefetching; cache behavior; data cache access; instruction windows; issue logic; memory consistency model; memory ordering model; out-of-order aggressiveness; processor performance; reorder buffer sizes; reordered memory instructions; select logic; virtual load/store queues; Buffer storage; Computer aided instruction; Computer architecture; Educational institutions; Frequency; Jacobian matrices; Logic; Out of order; Pipelines; Waste handling;
Conference_Titel :
High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on
Print_ISBN :
0-7695-2275-0
DOI :
10.1109/HPCA.2005.42