• DocumentCode
    3525323
  • Title

    AntsBOA: A New Time Series Pipeline for Big Data Processing, Analyzing and Querying in Online Advertising Application

  • Author

    Bin Song ; Shaosu Liu ; Kolay, Santanu ; Lo, Lawrence

  • Author_Institution
    Turn Inc., Redwood City, CA, USA
  • fYear
    2015
  • fDate
    March 30 2015-April 2 2015
  • Firstpage
    223
  • Lastpage
    232
  • Abstract
    This paper presents a new pipeline AntsBOA for big data analyzing, processing and querying. This pipeline is initially designed for online advertising application. However, it is easy to extend to other big data applications. The main idea is that AntsBOA is based on time series technology. The data processing of AntsBOA includes three levels, aggregation, time series and cache. Time series data and cache data are loading to a distributed database system, named Kodiak. Query server then queries these data in Kodiak and replies the result. This pipeline has been run in production for half a year. In our production, prior 16 months performance data is able to populate in less than half an hour. The response time of querying the 16 months performance data is less than several milliseconds in average. In addition, from our production results, cache level speeds up tens of times than aggregation level in term of query time. Time series cache level has a speedup 50% than cache level in term of Hadoop resource. And Time series loading performance speeds up about 10 times than traditional loading. Also our production system is monitored to guarantee in a healthy and stable state. In summary, AntsBOA is an efficient, accurate, recoverable, scalable and fault tolerant pipeline for big data processing, analyzing and querying.
  • Keywords
    Big Data; advertising data processing; cache storage; distributed databases; query processing; time series; AntsBOA; Big Data analyzing; Big Data applications; Big Data processing; Big Data querying; Kodiak; aggregation level; cache data; cache level; distributed database system; online advertising application; query server; time series data; time series technology; Advertising; Big data; Distributed databases; Pipelines; Servers; Time series analysis; Extract Transform Load (ETL); big data; distributed systems; online advertising; time series;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
  • Conference_Location
    Redwood City, CA
  • Type

    conf

  • DOI
    10.1109/BigDataService.2015.32
  • Filename
    7184885