Multi-query Unification for Generating Efficient Big Data Processing Components from a DFD

Author

Kimura, K. ; Nomura, Yutaka ; Kurihara, Hiroshi ; Yamamoto, Koji ; Yamamoto, Ryo

Author_Institution

Software Innovation Lab., FUJITSU Labs. Ltd., Kawasaki, Japan

fYear

2013

fDate

June 28 2013-July 3 2013

Firstpage

260

Lastpage

268

Abstract

This paper proposes multi-query unification, a technique for generating unified components from a DFD aimed at reducing the total cost of data transmission between components that are deployed to a computing fabric that includes processing nodes and interconnection services. The method focuses on generating components of the two primary data processing methodologies: cumulative data processing (CDP) and data stream processing (DSP). The method utilizes multi-query unification and generates a unified query by applying two methods depending on the order sensitivity of processes in a DFD. Nesting unification composes a unified query by embedding the query of a process into the query of the next process as a subquery. Clause assembly unification composes a query using templates for each clause of the original query. For clause assembly is applicable only to processes that is executable simultaneously, we define the criteria called order sensitivity for applying clause assembly and propose two-stage unification in which nesting unification is always applied after clause assembly. The performance evaluation based on a virtual DFD shows that applying two-stage unification reduces the execution time of components by 60 percent in DSP, however, execution time is reduced by only 10 percent in CDP. On the other hand, nesting unification alone reduces the execution time by 30 percent. Based on those results, we conclude that clause assembly should be applied to DSP using Esper but should not be applied to CDP using Hive.

Keywords

query processing; very large databases; CDP; DSP; Esper; Hive; big data processing components; clause assembly unification; cumulative data processing; data stream processing; data transmission; interconnection services; multiquery unification; nesting unification; order sensitivity; query processing; total cost reduction; two-stage unification; unified query; virtual DFD; Assembly; Data analysis; Data models; Digital signal processing; Engines; Sensitivity; DFD; big data; component; multi-query unification; order sensitivity; platform as a service;

fLanguage

English

Publisher

ieee

Conference_Titel

Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on

Conference_Location

Santa Clara, CA

Print_ISBN

978-0-7695-5028-2

Type

conf

DOI

10.1109/CLOUD.2013.99

Filename

6676703