• DocumentCode
    2297634
  • Title

    Graph-Cut Based Coscheduling Strategy Towards Efficient Execution of Scientific Workflows in Collaborative Cloud Environments

  • Author

    Deng, Kefeng ; Song, Junqiang ; Ren, Kaijun ; Yuan, Dong ; Chen, Jinjun

  • Author_Institution
    Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2011
  • fDate
    21-23 Sept. 2011
  • Firstpage
    34
  • Lastpage
    41
  • Abstract
    Recently, cloud computing has emerged as a promising computing infrastructure for performing scientific workflows by providing on-demand resources. Meanwhile, it is convenient for scientific collaboration since different cloud environments used by the researchers are connected through Internet. However, the significant latency arising from frequent access to large datasets and the corresponding data movements across geo-distributed data centers has been an obstacle to hinder the efficient execution of data-intensive scientific workflows. In this paper, we propose a novel graph-cut based data and task co scheduling strategy for minimizing the data transfer across geo-distributed data centers. Specifically, a dependency graph is firstly constructed from workflow provenance and cut into sub graphs according to the datasets which must appear in fixed data centers by a multiway cut algorithm. Then, the sub graphs might be recursively cut into smaller ones by a minimum cut algorithm referring to data correlation rules until all of them can well fit the capacity constraints of the data centers where the fixed location datasets reside. In this way, the datasets and tasks are distributed into target data centers while the total amount of data transfer between them is minimized. Additionally, a runtime scheduling algorithm is exploited to dynamically adjust the data placement during execution to prevent the data centers from overloading. Simulation results demonstrate that the total volume of data transfer across different data centers can be significantly reduced and the cost of performing scientific workflows on the clouds will be accordingly saved.
  • Keywords
    cloud computing; graph theory; groupware; scientific information systems; Internet; cloud computing; collaborative cloud environment; computing infrastructure; data movement; data transfer; data-intensive scientific workflow; dependency graph; geo-distributed data center; graph-cut based coscheduling strategy; graph-cut based data; minimum cut algorithm; multiway cut algorithm; on-demand resources; runtime scheduling algorithm; scientific collaboration; workflow provenance; Cloud computing; Contracts; Distributed databases; Heuristic algorithms; Joining processes; Partitioning algorithms; Runtime; cloud computing; data and task coscheduling; graph-cut algorithm; scientific workflow;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid Computing (GRID), 2011 12th IEEE/ACM International Conference on
  • Conference_Location
    Lyon
  • ISSN
    1550-5510
  • Print_ISBN
    978-1-4577-1904-2
  • Type

    conf

  • DOI
    10.1109/Grid.2011.14
  • Filename
    6076496