Title :
Graph-Cut Based Coscheduling Strategy Towards Efficient Execution of Scientific Workflows in Collaborative Cloud Environments
Author :
Deng, Kefeng ; Song, Junqiang ; Ren, Kaijun ; Yuan, Dong ; Chen, Jinjun
Author_Institution :
Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Recently, cloud computing has emerged as a promising computing infrastructure for performing scientific workflows by providing on-demand resources. Meanwhile, it is convenient for scientific collaboration since different cloud environments used by the researchers are connected through Internet. However, the significant latency arising from frequent access to large datasets and the corresponding data movements across geo-distributed data centers has been an obstacle to hinder the efficient execution of data-intensive scientific workflows. In this paper, we propose a novel graph-cut based data and task co scheduling strategy for minimizing the data transfer across geo-distributed data centers. Specifically, a dependency graph is firstly constructed from workflow provenance and cut into sub graphs according to the datasets which must appear in fixed data centers by a multiway cut algorithm. Then, the sub graphs might be recursively cut into smaller ones by a minimum cut algorithm referring to data correlation rules until all of them can well fit the capacity constraints of the data centers where the fixed location datasets reside. In this way, the datasets and tasks are distributed into target data centers while the total amount of data transfer between them is minimized. Additionally, a runtime scheduling algorithm is exploited to dynamically adjust the data placement during execution to prevent the data centers from overloading. Simulation results demonstrate that the total volume of data transfer across different data centers can be significantly reduced and the cost of performing scientific workflows on the clouds will be accordingly saved.
Keywords :
cloud computing; graph theory; groupware; scientific information systems; Internet; cloud computing; collaborative cloud environment; computing infrastructure; data movement; data transfer; data-intensive scientific workflow; dependency graph; geo-distributed data center; graph-cut based coscheduling strategy; graph-cut based data; minimum cut algorithm; multiway cut algorithm; on-demand resources; runtime scheduling algorithm; scientific collaboration; workflow provenance; Cloud computing; Contracts; Distributed databases; Heuristic algorithms; Joining processes; Partitioning algorithms; Runtime; cloud computing; data and task coscheduling; graph-cut algorithm; scientific workflow;
Conference_Titel :
Grid Computing (GRID), 2011 12th IEEE/ACM International Conference on
Conference_Location :
Lyon
Print_ISBN :
978-1-4577-1904-2
DOI :
10.1109/Grid.2011.14