DocumentCode :
127584
Title :
Addressing the Shimming Problem in Big Data Scientific Workflows
Author :
Mohan, Archith ; Shiyong Lu ; Kotov, Alexander
Author_Institution :
Wayne State Univ., Detroit, MI, USA
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
347
Lastpage :
354
Abstract :
Substantial amount of research has been done recently to address the shimming problem in scientific workflows, in which a special kind of adaptors, called shims, are inserted between workflow tasks to resolve the data type incompatibility issue. Recently, scientific workflows are increasingly used for big data analysis and processing, which poses additional challenges, such as volume, velocity and variety of data to the shimming problem. One issue is to scale the registration and configuration procedure to a large number of workflow tasks. Another issue is the ease of integrating a large number of remote Web services and other heterogeneous task components that can consume and produce data in various formats and models into a uniform and interoperable workflow. Existing approaches fall short in usability and scalability in addressing these issues. In this paper we 1) propose a new simplified single-component based task model based on extensive experiences and lessons learned from our original multiple-component based task model. The new model separates registration from configuration and eases the process of registering external functional components (such as Web services) into p-workflows, 2) propose a shim generation algorithm that elegantly solves the shimming problem raised by Web service based scientific workflows, and 3) we integrate MongoDB, a NoSQL document-oriented database system for storing and managing large-scale unstructured documents. A new version of the DATAVIEW system has been developed to support the proposed techniques and a case study has been conducted to show the feasibility and usability of our proposed techniques.
Keywords :
Big Data; SQL; Web services; data analysis; relational databases; Big Data analysis; Big Data scientific workflows; DATAVIEW system; MongoDB; NoSQL document-oriented database system; Web services; data type incompatibility; heterogeneous task components; shimming problem; Big data; Data models; Databases; Ports (Computers); Registers; Web services; XML; big data; scientific workflow; shimming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Services Computing (SCC), 2014 IEEE International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5065-2
Type :
conf
DOI :
10.1109/SCC.2014.53
Filename :
6930553
Link To Document :
بازگشت