Title :
Evaluating Dynamics and Bottlenecks of Memory Collaboration in Cluster Systems
Author :
Samih, Ahmad ; Wang, Ren ; Maciocco, Christian ; Tai, Tsung-Yuan Charlie ; Duan, Ronghui ; Duan, Jiangang ; Solihin, Yan
Author_Institution :
Dept. of Electical & Comput. Eng., North Carolina State Univ., Raleigh, NC, USA
Abstract :
With the fast development of highly-integrated distributed systems (cluster systems), designers face interesting memory hierarchy design choices while attempting to avoid the notorious disk swapping. Swapping to the free remote memory through Memory Collaboration has demonstrated its cost-effectiveness compared to over provisioning the cluster for peak load requirements. Recent memory collaboration studies propose several ways on accessing the under-utilized remote memory in static system configurations, without detailed exploration of the dynamic memory collaboration. Dynamic collaboration is an important aspect given the run-time memory usage fluctuations in clustered systems. Further, as the interest in memory collaboration grows, it is crucial to understand the existing performance bottlenecks, overheads, and potential optimization. In this paper we address these two issues. First, we propose an Autonomous Collaborative Memory System (ACMS) that manages memory resources dynamically at run time to optimize performance. We implement a prototype realizing the proposed ACMS, experiment with a wide range of real-world applications, and show up to 3× performance speedup compared to a non-collaborative memory system without perceivable performance impact on nodes that provide memory. Second, we analyze, in depth, the end-to-end memory collaboration overhead and pinpoint the corresponding bottlenecks.
Keywords :
pattern clustering; software engineering; storage management; ACMS; autonomous collaborative memory system; cluster systems; disk swapping; dynamic memory collaboration; end-to-end memory collaboration; free remote memory; highly-integrated distributed systems; memory hierarchy design; memory resources; noncollaborative memory system; peak load requirements; real-world applications; remote memory; run-time memory usage fluctuations; static system configurations; Collaboration; Heuristic algorithms; Memory management; Monitoring; Protocols; Random access memory; Servers; Memory sharing; clustered architecture; dis-aggregated memory; infniband; kernel swapping; memory collaboration; remote memory;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on
Conference_Location :
Ottawa, ON
Print_ISBN :
978-1-4673-1395-7
DOI :
10.1109/CCGrid.2012.59