Title :
SASM: Improving spark performance with Adaptive Skew Mitigation
Author :
Jiadong Yu;Haopeng Chen; Fei Hu
Author_Institution :
School of Software, Shanghai Jiao Tong University, China
Abstract :
Skew is a common phenomenon widely existing in parallel computing platforms, resulting in slowing down the entire complete time and many idle resources. We present Spark Adaptive Skew Mitigation (SASM) for automatic skew mitigation that is transparent to Spark users and existing Spark applications. The SASM system mitigates skew of shuffle read and computation misdistribution dynamically with metadata collected beforehand. When a new task is registered, unprocessed blocks of straggling tasks are repartitioned to other idle tasks to fully utilize the nodes. We evaluate its effectiveness by using several applications. The results show that SASM can reduce job runtime in presence of skew with insignificant overhead, and can handle over imbalance due to heterogeneous clusters or network congestion.
Keywords :
"Clustering algorithms","Heuristic algorithms","Sparks"
Conference_Titel :
Progress in Informatics and Computing (PIC), 2015 IEEE International Conference on
Print_ISBN :
978-1-4673-8086-7
DOI :
10.1109/PIC.2015.7489818