DocumentCode :
3143707
Title :
Towards Intelligent Data Placement for Scientific Workflows in Collaborative Cloud Environment
Author :
Liu, Xin ; Datta, Anwitaman
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2011
fDate :
16-20 May 2011
Firstpage :
1052
Lastpage :
1061
Abstract :
Recently emerged cloud computing offers a promising platform for executing scientific workflow applications due to its similar performance compared to the grid, lower cost, elasticity and so on. Collaborative cloud environments, which share resources of multiple geographically distributed data centers owned by different organizations enable researchers from all over the world to conduct their large scale data intensive research together through Internet. However, since scientific workflows consume and generate huge amount of data, it is thus essential to manage the data effectively for the purpose of high performance and cost effectiveness. In this paper, we propose intelligent data placement strategy to improve performance of workflows while minimizing data transfer among data centers. Specifically, at the startup stage, the whole dataset is divided into small data items which are then distributed among multiple data centers by considering these data centers´ computation capability, storage budget, data item correlation, etc. During the runtime stage, when intermediate data is generated, it is placed on the suitable data centers using linear discriminant analysis by taking into account the same metrics as at the startup stage, as well as data centers´ past behaviors (i.e., trustworthiness in terms of task delay). Simulation results demonstrate the promise of our data placement strategy by showing that compared to existing data placement strategies, our proposal effectively places the data to improve computation progress on the whole while minimizing the communication overheads incurred by data movement.
Keywords :
cloud computing; data handling; groupware; scientific information systems; workflow management software; Internet; cloud computing; collaborative cloud environment; data item; data management; data transfer; geographically distributed data center; intelligent data placement; linear discriminant analysis; scientific workflow; workflow performance; Cloud computing; Clustering algorithms; Correlation; Delay; Distributed databases; Organizations; Runtime;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2011.259
Filename :
6008893
Link To Document :
بازگشت