• DocumentCode
    650607
  • Title

    Multi-query Unification for Generating Efficient Big Data Processing Components from a DFD

  • Author

    Kimura, K. ; Nomura, Yutaka ; Kurihara, Hiroshi ; Yamamoto, Koji ; Yamamoto, Ryo

  • Author_Institution
    Software Innovation Lab., FUJITSU Labs. Ltd., Kawasaki, Japan
  • fYear
    2013
  • fDate
    June 28 2013-July 3 2013
  • Firstpage
    260
  • Lastpage
    268
  • Abstract
    This paper proposes multi-query unification, a technique for generating unified components from a DFD aimed at reducing the total cost of data transmission between components that are deployed to a computing fabric that includes processing nodes and interconnection services. The method focuses on generating components of the two primary data processing methodologies: cumulative data processing (CDP) and data stream processing (DSP). The method utilizes multi-query unification and generates a unified query by applying two methods depending on the order sensitivity of processes in a DFD. Nesting unification composes a unified query by embedding the query of a process into the query of the next process as a subquery. Clause assembly unification composes a query using templates for each clause of the original query. For clause assembly is applicable only to processes that is executable simultaneously, we define the criteria called order sensitivity for applying clause assembly and propose two-stage unification in which nesting unification is always applied after clause assembly. The performance evaluation based on a virtual DFD shows that applying two-stage unification reduces the execution time of components by 60 percent in DSP, however, execution time is reduced by only 10 percent in CDP. On the other hand, nesting unification alone reduces the execution time by 30 percent. Based on those results, we conclude that clause assembly should be applied to DSP using Esper but should not be applied to CDP using Hive.
  • Keywords
    query processing; very large databases; CDP; DSP; Esper; Hive; big data processing components; clause assembly unification; cumulative data processing; data stream processing; data transmission; interconnection services; multiquery unification; nesting unification; order sensitivity; query processing; total cost reduction; two-stage unification; unified query; virtual DFD; Assembly; Data analysis; Data models; Digital signal processing; Engines; Sensitivity; DFD; big data; component; multi-query unification; order sensitivity; platform as a service;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5028-2
  • Type

    conf

  • DOI
    10.1109/CLOUD.2013.99
  • Filename
    6676703