Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Pittsburgh, Pittsburgh, PA, USA
Abstract :
Process variations in integrated circuits have significant impact on their performance, leakage, and stability. This is particularly evident in large, regular, and dense structures such as DRAMs. DRAMs are built using minimized transistors with presumably uniform speed in an organized array structure. Process variation can introduce latency disparity among different memory arrays. With the proliferation of 3D stacking technology, DRAMs become a favorable choice for stacking on top of a multicore processor as a last level cache for large capacity, high bandwidth, and low power. Hence, variations in bank speed create a unique problem of nonuniform cache accesses in 3D space. In this paper, we investigate cache management techniques for tolerating process variation in a 3D DRAM stacked onto a multicore processor. We modeled the process variation in a four-layer DRAM memory, including cell transistor, capacitor trench, and peripheral circuit, to characterize the latency and retention time variations among different banks. As a result, the notion of fast and slow banks from the core´s standpoint is no longer associated with their physical distances with the banks. They are determined by the different bank latencies due to process variation. We develop cache migration schemes that utilize fast banks while limiting the cost due to migration. Our experiments show that there is a great performance benefit in exploiting fast memory banks through migration. On average, a variation-aware management can improve the performance of a workload over the baseline (where one of the slowest bank speed is assumed for all banks) by 16.5 percent. We are also only 0.8 percent away in performance from an ideal memory where no process variation is present.
Keywords :
DRAM chips; cache storage; multiprocessing systems; 3D DRAM; 3D die-stacked multicore processor; 3D space; 3D stacking technology; array structure; bank latency; bank speed; cache migration schemes; capacitor trench; cell transistor; four-layer DRAM memory; latency disparity; memory arrays; peripheral circuit; process variation-aware nonuniform cache management; retention time variations; variation-aware management; Arrays; Decoding; Microprocessors; Random access memory; Three dimensional displays; Transistors; 3D die stacking; DRAM; NUCA; Process variation;