Title :
Performance Study of Spindle, A Web Analytics Query Engine Implemented in Spark
Author :
Amos, Brandon ; Tompkins, David
Author_Institution :
Adobe Res. San Jose, San Jose, CA, USA
Abstract :
This paper shares our experiences building and benchmarking Spindle as an open source Spark-based web analytics platform. Spindle´s design has been motivated by real-world queries and data requiring concurrent, low latency query execution. We identify a search space of Spark tuning options and study their impact on Spark´s performance. Results from a self-hosted six node cluster with one week of analytics data (13.1GB) indicate tuning options such as proper partitioning can cause a 5x performance improvement.
Keywords :
public domain software; query processing; software performance evaluation; Spark tuning options; Spindle performance study; Web analytics query engine; low latency query execution; open source Spark-based Web analytics platform; real-world queries; self-hosted six node cluster; Context; Instruction sets; Libraries; Loading; Production; Sparks; Tuning; data processing; distributed systems; performance study; web analytics;
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on
Conference_Location :
Singapore
DOI :
10.1109/CloudCom.2014.111