Author_Institution :
National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
Abstract :
OLAP query is an efficient way to gain quick insight into big data. Spark is a fast and general engine for big data processing, which supports interactive OLAP queries. Nevertheless, as a general engine, there are many parameters that affect the performance of the Spark, and thus it is necessary to study the appropriate setting in order to gain better performance on a specific scenario. In this paper, we choose typical queries from real scenarios, and present measurement results that are obtained by perform these queries on real dataset up to 7T13 on a 32 nodes cluster. After tuning, queries gain obvious performance improvement against the default setting.