• DocumentCode
    659418
  • Title

    Algebraic dataflows for big data analysis

  • Author

    Dias, Joana ; Ogasawara, Eduardo ; de Oliveira, Daniel ; Porto, F. ; Valduriez, Patrick ; Mattoso, Marta

  • Author_Institution
    Fed. Univ. of Rio de Janeiro - COPPE/UFRJ, Rio de Janeiro, Brazil
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    150
  • Lastpage
    155
  • Abstract
    Analyzing big data requires the support of dataflows with many activities to extract and explore relevant information from the data. Recent approaches such as Pig Latin propose a high-level language to model such dataflows. However, the dataflow execution is typically delegated to a MapRe-duce implementation such as Hadoop, which does not follow an algebraic approach, thus it cannot take advantage of the optimization opportunities of PigLatin algebra. In this paper, we propose an approach for big data analysis based on algebraic workflows, which yields optimization and parallel execution of activities and supports user steering using provenance queries. We illustrate how a big data processing dataflow can be modeled using the algebra. Through an experimental evaluation using real datasets and the execution of the dataflow with Chiron, an engine that supports our algebra, we show that our approach yields performance gains of up to 19.6% using algebraic optimizations in the dataflow and up to 39.1% of time saved on a user steering scenario.
  • Keywords
    Big Data; data analysis; data flow computing; high level languages; query processing; Chiron; Hadoop; MapRe-duce implementation; PigLatin algebra; activities optimization; activities parallel execution; algebraic dataflow execution; algebraic optimizations; big data analysis; big data processing dataflow; high-level language; information extraction; provenance queries; user steering scenario; Algebra; Data handling; Data storage systems; History; Information management; Optimization; Runtime; algebraic workflow; big data; dataflow; performance evaluation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691567
  • Filename
    6691567