DocumentCode :
127515
Title :
A System Architecture for Running Big Data Workflows in the Cloud
Author :
Kashlev, Andrey ; Shiyong Lu
Author_Institution :
Dept. of Comput. Sci., Wayne State Univ., Wayne, MI, USA
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
51
Lastpage :
58
Abstract :
Scientific workflows have become an important paradigm for domain scientists to formalize and structure complex data-intensive scientific processes. The ever-increasing volumes of scientific data motivate researchers to extend scientific workflow management systems (SWFMSs) to utilize the power of Cloud computing to perform big data analyses. Unlike workflows run in traditional on-premise environments such as stand-alone workstations or grids, Cloud workflows rely on dynamically provisioned computing, storage and network resources that are terminated when no longer used. This dynamic and volatile nature of cloud resources as well as other cloud-specific factors introduce a new set of challenges for "Cloud-enabled" SWFMSs. Although few SWFMSs have been integrated with Cloud infrastructures that provide some experience for future research and development, a comprehensive study from an architectural perspective is still missing. To this end, we conduct a hands-on study by running a big data workflow in Amazon EC2, FutureGrid Eucalyptus and OpenStack clouds. From this experience we 1) identify the key challenges for running big data workflows in the cloud, 2) propose a generic implementation-independent system architecture that addresses these challenges, 3) develop a cloud-enabled SWFMS called DATAVIEW that delivers a specific implementation of the proposed architecture. Finally, to validate our proposed architecture we conduct a case study in which we design and run a big data workflow towards addressing EB-scale big data analysis problem in the automotive industry domain.
Keywords :
Big Data; cloud computing; data analysis; scientific information systems; workflow management software; Amazon EC2; DATAVIEW; EB-scale big data analysis problem; FutureGrid Eucalyptus; OpenStack clouds; automotive industry domain; big data workflows; cloud computing; cloud workflows; cloud-enabled SWFMSs; complex data-intensive scientific processes; dynamically provisioned computing; dynamically provisioned network resources; dynamically provisioned storage; implementation-independent system architecture; scientific data; scientific workflow management systems; Big data; Computer architecture; Engines; Monitoring; Runtime; Servers; Virtual machining; big data; cloud; component; scientific workflow;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Services Computing (SCC), 2014 IEEE International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5065-2
Type :
conf
DOI :
10.1109/SCC.2014.16
Filename :
6930516
Link To Document :
بازگشت