Title :
Dynamic cluster resource allocations for jobs with known and unknown memory demands
Author :
Xiao, Li ; Chen, Songqing ; Zhang, Xiaodong
Author_Institution :
Dept. of Comput. Sci., Coll. of William & Mary, Williamsburg, VA, USA
fDate :
3/1/2002 12:00:00 AM
Abstract :
The cluster system we consider for load sharing is a compute farm which is a pool of networked server nodes providing high-performance computing for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing resource management systems mainly target at balancing the usage of CPU loads among server nodes. With the rapid advancement of CPU chips, memory and disk access speed improvements significantly lag behind advancement of CPU speed, increasing the penalty for data movement, such as page faults and I/O operations, relative to normal CPU operations. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in clusters. We study two types of application workloads: 1) Memory demands are known in advance or are predictable and 2) memory demands are unknown and dynamically changed during execution. Besides using workload traces with known memory demands, we have also made kernel instrumentation to collect different types of workload execution traces to capture dynamic memory access patterns. Conducting different groups of trace-driven simulations, we show that our proposed policies can effectively improve overall job execution performance by well utilizing both CPU and memory resources with known and unknown memory demands
Keywords :
distributed memory systems; operating system kernels; resource allocation; virtual machines; workstation clusters; CPU loads; compute farm; dynamic cluster resource allocations; dynamic memory access patterns; job execution performance; kernel instrumentation; load sharing; memory demands; memory- intensive workloads; networked server nodes; trace-driven simulations; workload execution traces; Resource management;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on