• DocumentCode
    610441
  • Title

    COLA: A cloud-based system for online aggregation

  • Author

    Yantao Gan ; Xiaofeng Meng ; Yingjie Shi

  • Author_Institution
    Sch. of Inf., Renmin Univ. of China, Beijing, China
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    1368
  • Lastpage
    1371
  • Abstract
    Online aggregation is a promising solution to achieving fast early responses for interactive ad-hoc queries that compute aggregates on massive data. To process large datasets on large-scale computing clusters, MapReduce has been introduced as a popular paradigm into many data analysis applications. However, typical MapReduce implementations are not well-suited to analytic tasks, since they are geared towards batch processing. With the increasing popularity of ad-hoc analytic query processing over enormous datasets, processing aggregate queries using MapReduce in an online fashion is therefore an emerging important application need. We present a MapReduce-based online aggregation system called COLA, which provides progressive approximate aggregate answers for both single table and multiple joined tables. COLA provides an online aggregation execution engine with novel sampling techniques to support incremental and continuous computing of aggregation, and minimize the waiting time before an acceptably precise estimate is available. In addition, user-friendly SQL queries are supported in COLA. Furthermore, COLA can implicitly convert non-OLA jobs into online version so that users don´t have to write any special-purpose code to make estimates.
  • Keywords
    SQL; batch processing (computers); cloud computing; data handling; query processing; COLA; MapReduce implementations; MapReduce-based online aggregation system; batch processing; cloud based system for online aggregation; data analysis applications; interactive adhoc queries; large dataset process; large scale computing clusters; novel sampling techniques; online aggregation execution engine; user-friendly SQL queries; Aggregates; Electronic publishing; Encyclopedias; Engines; Internet; Query processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544946
  • Filename
    6544946