Title :
Network-aware resource management for scalable data analytics frameworks
Author :
Thomas Renner;Lauritz Thamsen;Odej Kao
Author_Institution :
Technische Universit?t Berlin, Germany
Abstract :
Sharing cluster resources between multiple frameworks, applications and datasets is important for organizations doing large scale data analytics. It improves cluster utilization, avoids standalone clusters running only a single framework and allows data scientists to choose the best framework for each analysis task. Current systems for cluster resource management like YARN or Mesos achieve resource sharing using containers. Analytics frameworks execute their tasks in these containers. However, currently the container placement is based predominantly on available computing capabilities in terms of cores and memory, yet neglects to also take the network topology and data locations into account. In this paper, we propose a container placement approach that (a) takes the network topology into account to prevent network congestions in the core network and (b) places containers close to input data to improve data locality and reduce remote disk reads in distributed file systems. The main advantages of introducing topology- and data-awareness on the level of container placement is that multiple application frameworks benefit from improvements. We present a prototype integrated with Hadoop YARN and an evaluation with workloads consisting of different applications and datasets using Apache Flink. Our evaluation on a 64 core cluster, in which nodes are connected through a fat tree topology, shows promising results with speedups of up to 67% for network-intensive workloads.
Keywords :
"Containers","Data analysis","Resource management","Network topology","Yarn","Bandwidth","Topology"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7364083