Title :
Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing
Author :
Ping Chi ; Cong Xu ; Tao Zhang ; Xiangyu Dong ; Yuan Xie
Author_Institution :
Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
Abstract :
High reliability, availability, and serviceability are critical for modern large-scale computing systems. As an effective error recovery mechanism, checkpointing has been widely used in such systems for their survival from unexpected failures. The conventional checkpointing schemes, however, are time-consuming due to the limited I/O bandwidth between the DRAM-based main memory and the backup storage. To mitigate the checkpoint overhead, we propose a fast local checkpointing scheme by leveraging Multi-Level Cell (MLC) STT-RAM. We take advantage of the unique features of MLC STT-RAM to accelerate local checkpointing. Our experimental results show that the average performance overhead is less than 1% in a multi-programmed four-core process node with a 1-second local checkpoint interval. The evaluation results also demonstrate that using MLC STT-RAM is an energy-efficient solution.
Keywords :
checkpointing; microcomputers; random-access storage; DRAM-based main memory; MLC STT-RAM; backup storage; checkpointing schemes; error recovery mechanism; large-scale computing systems; multilevel cell STT-RAM; spin transfer torque random access memory; Checkpointing; Magnetic tunneling; Nonvolatile memory; Phase change random access memory; Resistance; Switches;
Conference_Titel :
Computer-Aided Design (ICCAD), 2014 IEEE/ACM International Conference on
Conference_Location :
San Jose, CA
DOI :
10.1109/ICCAD.2014.7001367