DocumentCode
650607
Title
Multi-query Unification for Generating Efficient Big Data Processing Components from a DFD
Author
Kimura, K. ; Nomura, Yutaka ; Kurihara, Hiroshi ; Yamamoto, Koji ; Yamamoto, Ryo
Author_Institution
Software Innovation Lab., FUJITSU Labs. Ltd., Kawasaki, Japan
fYear
2013
fDate
June 28 2013-July 3 2013
Firstpage
260
Lastpage
268
Abstract
This paper proposes multi-query unification, a technique for generating unified components from a DFD aimed at reducing the total cost of data transmission between components that are deployed to a computing fabric that includes processing nodes and interconnection services. The method focuses on generating components of the two primary data processing methodologies: cumulative data processing (CDP) and data stream processing (DSP). The method utilizes multi-query unification and generates a unified query by applying two methods depending on the order sensitivity of processes in a DFD. Nesting unification composes a unified query by embedding the query of a process into the query of the next process as a subquery. Clause assembly unification composes a query using templates for each clause of the original query. For clause assembly is applicable only to processes that is executable simultaneously, we define the criteria called order sensitivity for applying clause assembly and propose two-stage unification in which nesting unification is always applied after clause assembly. The performance evaluation based on a virtual DFD shows that applying two-stage unification reduces the execution time of components by 60 percent in DSP, however, execution time is reduced by only 10 percent in CDP. On the other hand, nesting unification alone reduces the execution time by 30 percent. Based on those results, we conclude that clause assembly should be applied to DSP using Esper but should not be applied to CDP using Hive.
Keywords
query processing; very large databases; CDP; DSP; Esper; Hive; big data processing components; clause assembly unification; cumulative data processing; data stream processing; data transmission; interconnection services; multiquery unification; nesting unification; order sensitivity; query processing; total cost reduction; two-stage unification; unified query; virtual DFD; Assembly; Data analysis; Data models; Digital signal processing; Engines; Sensitivity; DFD; big data; component; multi-query unification; order sensitivity; platform as a service;
fLanguage
English
Publisher
ieee
Conference_Titel
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location
Santa Clara, CA
Print_ISBN
978-0-7695-5028-2
Type
conf
DOI
10.1109/CLOUD.2013.99
Filename
6676703
Link To Document