Title :
Resilient optically connected memory systems using dynamic bit-steering [Invited]
Author :
Brunina, Daniel ; Lai, Caroline P. ; Liu, Dawei ; Garg, Ajay S. ; Bergman, Keren
Author_Institution :
Dept. of Electr. Eng., Columbia Univ., New York, NY, USA
Abstract :
Resilience is becoming an increasingly critical performance requirement for future large-scale computing systems. In data center and high-performance computing systems with many thousands of nodes, errors in main memory can be a significant source of failures. As a result, large-scale memory systems must employ advanced error detection and correction techniques to mitigate failures. Memory devices are primarily designed for density, optimizing memory capacity and throughput, rather than resilience. A strict focus on memory performance instead of resilience risks undermining the overall stability of next-generation computers. In this work, we leverage an optically connected memory system to optimize both memory performance and resilience. A multicast-capable optical interconnection network replaces the traditional electronic bus between a processor and its main memory, allowing for a novel error-correction technique based on dynamic bit-steering. As compared to an electronically connected approach, we demonstrate significantly higher memory bandwidths and reduced latencies, in addition to a 700× improvement in resilience.
Keywords :
error correction codes; optical interconnections; optical storage; dynamic bit-steering; error-correction technique; high memory bandwidths; memory performance; multicast-capable optical interconnection network; processor; reduced latencies; resilient optically connected memory systems; Adaptive optics; Error correction codes; Optical fiber networks; Optical interconnections; Optical receivers; Resilience; SDRAM; Memory architecture; Optics in computing; Photonic switching systems; SDRAM;
Journal_Title :
Optical Communications and Networking, IEEE/OSA Journal of
DOI :
10.1364/JOCN.4.00B151