• DocumentCode
    166710
  • Title

    Evaluating Grasp-based cloud dimensioning for comparative genomics: A practical approach

  • Author

    Coutinho, Rafaelli ; Drummond, Lucia ; Frota, Yuri ; de Oliveira, Daniel ; Ocana, Kary

  • Author_Institution
    IC/Fluminense Fed. Univ., Niteroi, Brazil
  • fYear
    2014
  • fDate
    22-26 Sept. 2014
  • Firstpage
    371
  • Lastpage
    379
  • Abstract
    Cloud computing establishes a new computing model where a wide range of computing resources are provided to several types of users. Especially for bioinformatics experiments modeled as scientific workflows, clouds provide several types of resources as virtual machines (VM), storage, databases and computing power that can be combined for empowering the scientific workflow execution. These workflows usually require high performance environments and parallelism techniques since their activities are data and computing intensive and can execute for a long time. There are then some Scientific Workflow Management Systems (SWfMS) that already manage the parallel execution of scientific workflows in clouds. Most of them instantiate a virtual cluster for the execution. However, they rely on the user to estimate the amount of VMs to be instantiated to create this virtual cluster. Estimating the amount of VMs to instantiate is then a crucial task to avoid negative impacts on the workflow performance with under or over estimations. This dimensioning also is not a trivial task in clouds due to the large number of VM types to choose in a cloud provider. Previously proposed approach named GraspCC already provides a near optimal estimation of the amount of VM for general applications, not scientific workflows. In this paper, we coupled the GraspCC to SciCumulus (Cloud-based Parallel Engine for Scientific Workflows) engine to estimate the necessary amount of VMs for bioinformatics workflows. We have evaluated GraspCC by comparing the estimative with real executions of a set of large-scale comparative genomics workflows. It showed the suitability of GraspCC to estimate the amount of VMs in real bioinformatics cloud workflows.
  • Keywords
    bioinformatics; cloud computing; genomics; virtual machines; GraspCC; SWfMS; SciCumulus; VM; bioinformatics cloud workflows; bioinformatics experiments; cloud computing; comparative genomics workflows; computing resources; databases; grasp-based cloud dimensioning; parallelism techniques; scientific workflow environments; scientific workflow management systems; virtual cluster; virtual machines; Bioinformatics; Computational modeling; Drugs; Estimation; Genomics; Hidden Markov models; Phylogeny; Bioinformatics Workflows; Cloud Computing; Virtual Machine Allocation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2014 IEEE International Conference on
  • Conference_Location
    Madrid
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2014.6968789
  • Filename
    6968789