DocumentCode
3525323
Title
AntsBOA: A New Time Series Pipeline for Big Data Processing, Analyzing and Querying in Online Advertising Application
Author
Bin Song ; Shaosu Liu ; Kolay, Santanu ; Lo, Lawrence
Author_Institution
Turn Inc., Redwood City, CA, USA
fYear
2015
fDate
March 30 2015-April 2 2015
Firstpage
223
Lastpage
232
Abstract
This paper presents a new pipeline AntsBOA for big data analyzing, processing and querying. This pipeline is initially designed for online advertising application. However, it is easy to extend to other big data applications. The main idea is that AntsBOA is based on time series technology. The data processing of AntsBOA includes three levels, aggregation, time series and cache. Time series data and cache data are loading to a distributed database system, named Kodiak. Query server then queries these data in Kodiak and replies the result. This pipeline has been run in production for half a year. In our production, prior 16 months performance data is able to populate in less than half an hour. The response time of querying the 16 months performance data is less than several milliseconds in average. In addition, from our production results, cache level speeds up tens of times than aggregation level in term of query time. Time series cache level has a speedup 50% than cache level in term of Hadoop resource. And Time series loading performance speeds up about 10 times than traditional loading. Also our production system is monitored to guarantee in a healthy and stable state. In summary, AntsBOA is an efficient, accurate, recoverable, scalable and fault tolerant pipeline for big data processing, analyzing and querying.
Keywords
Big Data; advertising data processing; cache storage; distributed databases; query processing; time series; AntsBOA; Big Data analyzing; Big Data applications; Big Data processing; Big Data querying; Kodiak; aggregation level; cache data; cache level; distributed database system; online advertising application; query server; time series data; time series technology; Advertising; Big data; Distributed databases; Pipelines; Servers; Time series analysis; Extract Transform Load (ETL); big data; distributed systems; online advertising; time series;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
Conference_Location
Redwood City, CA
Type
conf
DOI
10.1109/BigDataService.2015.32
Filename
7184885
Link To Document