DocumentCode :
2441569
Title :
A cost-effective strategy for intermediate data storage in scientific cloud workflow systems
Author :
Yuan, Dong ; Yang, Yun ; Liu, Xiao ; Chen, Jinjun
Author_Institution :
Fac. of Inf. & Commun. Technol., Swinburne Univ. of Technol., Melbourne, VIC, Australia
fYear :
2010
fDate :
19-23 April 2010
Firstpage :
1
Lastpage :
12
Abstract :
Many scientific workflows are data intensive where a large volume of intermediate data is generated during their execution. Some valuable intermediate data need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science on cloud has become popular nowadays, more intermediate data can be stored in scientific cloud workflows based on a pay-for-use model. In this paper, we build an Intermediate data Dependency Graph (IDG) from the data provenances in scientific workflows. Based on the IDG, we develop a novel intermediate data storage strategy that can reduce the cost of the scientific cloud workflow system by automatically storing the most appropriate intermediate datasets in the cloud storage. We utilise Amazon´s cost model and apply the strategy to an astrophysics pulsar searching scientific workflow for evaluation. The results show that our strategy can reduce the overall cost of scientific cloud workflow execution significantly.
Keywords :
distributed processing; natural sciences computing; storage management; Amazon cost model; astrophysics pulsar searching scientific workflow; data storage strategy; intermediate data dependency graph; intermediate data storage; pay-for-use model; scientific cloud workflow systems; scientific workflows; system storage capacity; Astrophysics; Australia; Cloud computing; Collaborative work; Communications technology; Costs; High performance computing; Internet; Memory; Storage automation; cloud computing; cost; data storage; scientific workflow;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
ISSN :
1530-2075
Print_ISBN :
978-1-4244-6442-5
Type :
conf
DOI :
10.1109/IPDPS.2010.5470453
Filename :
5470453
Link To Document :
بازگشت