Title :
Two-Level Main Memory Co-Design: Multi-threaded Algorithmic Primitives, Analysis, and Simulation
Author :
Bender, Michael A. ; Berry, Jonathan ; Hammond, Simon D. ; Hemmert, K. Scott ; McCauley, Samuel ; Moore, Branden ; Moseley, Benjamin ; Phillips, Cynthia A. ; Resnick, David ; Rodrigues, Arun
Author_Institution :
Stony Brook Univ., Stony Brook, NY, USA
Abstract :
A fundamental challenge for supercomputer architecture is that processors cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. As the number of cores per chip increases, and traditional DDR DRAM speeds stagnate, the problem is only getting worse. A variety of non-DDR 3D memory technologies (Wide I/O 2, HBM) offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. However, such a packaging scheme cannot contain sufficient memory capacity for a node. It seems likely that future systems will require at least two levels of main memory: high-bandwidth, low-power memory near the processor and low-bandwidth high-capacity memory further away. This near memory will probably not have significantly faster latency than the far memory. This, combined with the large size of the near memory (multiple GB) and power constraints, may make it difficult to treat it as a standard cache. In this paper, we explore some of the design space for a user-controlled multi-level main memory. We present algorithms designed for the heterogeneous bandwidth, using streaming to exploit data locality. We consider algorithms for the fundamental application of sorting. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories´ SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance. This co-design effort suggests implementing two-level main memory systems may improve memory performance in fundamental applications.
Keywords :
DRAM chips; integrated circuit design; multi-threading; parallel architectures; parallel machines; CPU; DDR DRAM speeds; Sandia National Laboratories SST simulation; architectural parameter settings; data locality; high-bandwidth low-power memory; low-bandwidth high-capacity memory; memory capacity; memory performance; memory-bandwidth bound applications; memory-block transfers; multithreaded algorithmic primitives; nonDDR 3D memory technologies; packaging scheme; power constraints; silicon interposer; supercomputer architecture; two-level main memory codesign; user-controlled multilevel main memory; Algorithm design and analysis; Bandwidth; Computational modeling; Computer architecture; Hardware; Random access memory; Sorting; Memory bound; Sorting; algorithmic co-design; simulation; two-level main memory;
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
DOI :
10.1109/IPDPS.2015.94