DocumentCode :
1519443
Title :
Macro Data Load: An Efficient Mechanism for Enhancing Loaded Data Reuse
Author :
Jin, Lei ; Cho, Sangyeun
Author_Institution :
Dept. of Comput. Sci., Univ. of Pittsburgh, Pittsburgh, PA, USA
Volume :
60
Issue :
4
fYear :
2011
fDate :
4/1/2011 12:00:00 AM
Firstpage :
526
Lastpage :
537
Abstract :
This paper presents a study on macro data load, a novel mechanism to increase the amount of loaded data reuse within a processor. A macro data load brings into the processor a maximum-width data the cache port allows. In a 64-bit processor, for example, a byte load will bring a full 64-bit data from cache and save it in an internal hardware structure, while using for itself only the specified byte out of the 64-bit data. The saved data can be opportunistically reused by later loads internally, reducing relatively more expensive cache accesses. We present a comprehensive availability study using a generalized memory data reuse table (MDRT) to quantify available memory data reuse opportunities in a set of benchmark programs drawn from the SPEC2k and MiBench suites, and to demonstrate the efficacy of the proposed scheme. The macro data load mechanism is shown to open up significantly more loaded data reuse opportunities than previous schemes with no support for spatial locality. We observe 15.1 percent (SPEC2k integer), 20.9 percent (SPEC2k floating-point), and 45.8 percent (MiBench) more load-to-load forwarding instances when a 256-entry MDRT is used. We also describe a modified load store queue design as a possible implementation of the proposed concept. Our quantitative study using a realistic processor model shows that 21.3 percent, 14.8 percent, and 23.6 percent of L1 cache accesses in the SPEC2k integer, floating-point, and MiBench programs can be eliminated, resulting in a related energy reduction of 11.4 percent, 9.0 percent, and 14.3 percent on average, respectively.
Keywords :
benchmark testing; cache storage; computer architecture; microprocessor chips; 64-bit processor; MDRT; MiBench suite; SPEC2k suite; benchmark program; cache port; load store queue design; loaded data reuse; macro data load; memory data reuse table; superscalar processor architecture; word length 64 bit; Superscalar processor architecture; cache memory; load store queue; low-power processor design.;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.2010.131
Filename :
5487494
Link To Document :
بازگشت