DocumentCode
3680228
Title
Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server
Author
Ahsan Javed Awan;Mats Brorsson;Vladimir Vlassov;Eduard Ayguade
Author_Institution
Software &
fYear
2015
Firstpage
1
Lastpage
8
Abstract
In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.
Keywords
"Benchmark testing","Sparks","Instruction sets","Scalability","Servers","Data analysis","Big data"
Publisher
ieee
Conference_Titel
Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on
Type
conf
DOI
10.1109/BDCloud.2015.37
Filename
7310708
Link To Document