Title :
Chronos: An elastic parallel framework for stream benchmark generation and simulation
Author :
Ling Gu ; Minqi Zhou ; Zhenjie Zhang ; Ming-Chien Shan ; Aoying Zhou ; Winslett, Marianne
Author_Institution :
Shanghai Key Lab. of Trustworthy Comput., East China Normal Univ., Shanghai, China
Abstract :
In the coming big data era, stress test to IT systems under extreme data volume is crucial to the adoption of computing technologies in every corner of the cyber world. Appropriately generated benchmark datasets provide the possibility for administrators to evaluate the capacity of the systems when real datasets hard obtained have not extreme cases. Traditional benchmark data generators, however, mainly target at producing relation tables of arbitrary size following fixed distributions. The output of such generators are insufficient when it is used to measure the stability of the architecture with extremely dynamic and heavy workloads, caused by complicated/hiden factors in the generation mechanism of real world, e.g. dependency between stocks in the trading market and collaborative human behaviors on the social network. In this paper, we present a new framework, called Chronos, to support new demands on streaming data benchmarking, by generating and simulating realistic and fast data streams in an elastic manner. Given a small group of samples with timestamps, Chronos reproduces new data streams with similar characteristics of the samples, preserving column-wise correlations, temporal dependency and order statistics of the snapshot distributions at the same time. To achieve such realistic requirements, we propose 1) a column decomposition optimization technique to partition the original relation table into small sub-tables with minimal correlation information loss, 2) a generative and extensible model based on Latent Dirichlet Allocation to capture temporal dependency while preserving order statistics of the snapshot distribution, and 3) a new generation and assembling method to efficiently build tuples following the expected distribution on the snapshots. To fulfill the vision of elasticity, we also present a new parallel stream data generation mechanism, facilitating distributed nodes to collaboratively generate tuples with minimal synchronization overhead and e- cellent load balancing. Our extensive experimental studies on real world data domains confirm the efficiency and effectiveness of Chronos on stream benchmark generation and simulation.
Keywords :
Big Data; optimisation; parallel processing; program assemblers; resource allocation; Big Data; Chronos framework; IT systems; assembling method; benchmark data generators; collaborative human behaviors; column decomposition optimization technique; column-wise correlations; cyber world; elastic parallel framework; extreme data volume; fast data streams; latent Dirichlet allocation; load balancing; minimal correlation information loss; minimal synchronization overhead; order statistics; parallel stream data generation mechanism; snapshot distributions; social network; stream benchmark generation; temporal dependency; timestamps; trading market; Benchmark testing; Complexity theory; Computational modeling; Correlation; Distributed databases; Generators;
Conference_Titel :
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location :
Seoul
DOI :
10.1109/ICDE.2015.7113276