DocumentCode :
2635748
Title :
DRAM-Level Prefetching for Fully-Buffered DIMM: Design, Performance and Power Saving
Author :
Lin, Jiang ; Zheng, Hongzhong ; Zhu, Zhichun ; Zhang, Zhao ; David, Howard
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA
fYear :
2007
fDate :
25-27 April 2007
Firstpage :
94
Lastpage :
104
Abstract :
We have studied DRAM-level prefetching for the fully buffered DIMM (FB-DIMM) designed for multi-core processors. FB-DIMM has a unique two-level interconnect structure, with FB-DIMM channels at the first-level connecting the memory controller and advanced memory buffers (AMBs); and DDR2 buses at the second-level connecting the AMBs with DRAM chips. We propose an AMB prefetching method that prefetches memory blocks from DRAM chips to AMBs. It utilizes the redundant bandwidth between the DRAM chips and AMBs but does not consume the crucial channel bandwidth. The proposed method fetches K memory blocks of L2 cache block sizes around the demanded block, where K is a small value ranging from two to eight. The method may also reduce the DRAM power consumption by merging some DRAM precharges and activations. Our cycle-accurate simulation shows that the average performance improvement is 16% for single-core and multi-core workloads constructed from memory-intensive SPEC2000 programs with software cache prefetching enabled; and no workload has negative speedup. We have found that the performance gain comes from the reduction of idle memory latency and the improvement of channel bandwidth utilization. We have also found that there is only a small overlap between the performance gains from the AMB prefetching and the software cache prefetching. The average of estimated power saving is 15%
Keywords :
DRAM chips; storage management; storage management chips; DRAM chip; DRAM power consumption; DRAM-level prefetching; L2 cache block; SPEC2000 program; channel bandwidth utilization; dual in-line memory module; dynamic random access memory; fully-buffered DIMM; idle memory latency; interconnect structure; memory block; memory controller; multicore processor; power saving; redundant bandwidth; software cache prefetching; Bandwidth; Energy consumption; Joining processes; Merging; Multicore processing; Performance gain; Prefetching; Process design; Random access memory; Software performance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Performance Analysis of Systems & Software, 2007. ISPASS 2007. IEEE International Symposium on
Conference_Location :
San Jose, CA
Print_ISBN :
1-4244-1082-7
Electronic_ISBN :
1-4244-1082-7
Type :
conf
DOI :
10.1109/ISPASS.2007.363740
Filename :
4211026
Link To Document :
بازگشت