DocumentCode
3143707
Title
Towards Intelligent Data Placement for Scientific Workflows in Collaborative Cloud Environment
Author
Liu, Xin ; Datta, Anwitaman
Author_Institution
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear
2011
fDate
16-20 May 2011
Firstpage
1052
Lastpage
1061
Abstract
Recently emerged cloud computing offers a promising platform for executing scientific workflow applications due to its similar performance compared to the grid, lower cost, elasticity and so on. Collaborative cloud environments, which share resources of multiple geographically distributed data centers owned by different organizations enable researchers from all over the world to conduct their large scale data intensive research together through Internet. However, since scientific workflows consume and generate huge amount of data, it is thus essential to manage the data effectively for the purpose of high performance and cost effectiveness. In this paper, we propose intelligent data placement strategy to improve performance of workflows while minimizing data transfer among data centers. Specifically, at the startup stage, the whole dataset is divided into small data items which are then distributed among multiple data centers by considering these data centers´ computation capability, storage budget, data item correlation, etc. During the runtime stage, when intermediate data is generated, it is placed on the suitable data centers using linear discriminant analysis by taking into account the same metrics as at the startup stage, as well as data centers´ past behaviors (i.e., trustworthiness in terms of task delay). Simulation results demonstrate the promise of our data placement strategy by showing that compared to existing data placement strategies, our proposal effectively places the data to improve computation progress on the whole while minimizing the communication overheads incurred by data movement.
Keywords
cloud computing; data handling; groupware; scientific information systems; workflow management software; Internet; cloud computing; collaborative cloud environment; data item; data management; data transfer; geographically distributed data center; intelligent data placement; linear discriminant analysis; scientific workflow; workflow performance; Cloud computing; Clustering algorithms; Correlation; Delay; Distributed databases; Organizations; Runtime;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location
Shanghai
ISSN
1530-2075
Print_ISBN
978-1-61284-425-1
Electronic_ISBN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2011.259
Filename
6008893
Link To Document