• DocumentCode
    168750
  • Title

    From Scripted HPC-Based NGS Pipelines to Workflows on the Cloud

  • Author

    Cala, Jacek ; Yaobo Xu ; Wijaya, Eldarina Azfar ; Missier, Paolo

  • Author_Institution
    Sch. of Comput. Sci., Newcastle Univ., Newcastle upon Tyne, UK
  • fYear
    2014
  • fDate
    26-29 May 2014
  • Firstpage
    694
  • Lastpage
    700
  • Abstract
    In this paper we describe our initial experiences in the Cloud-e-Genome project with moving the whole exome sequencing pipeline from the scripted HPC-based solution to a workflow enactment system running in the cloud. We discuss shortcomings of the existing approach based on scripts and list benefits that a workflow-based solution can provide. Despite the effort it involved to wrap all required tools in the form of workflow blocks and the restrictions of the dataflow model used to represent workflows we expect the migration to significantly improve the current status of the pipeline. Our target is to enable flexibility, traceability and reproducibility of the solution, so that it can better fit the evolution of tools, data and pipeline itself and allow us to run it at national scale. This work will become foundation for the more complete system that includes variant filtering and interpretation for the diagnostic purposes.
  • Keywords
    bioinformatics; cloud computing; data flow computing; genomics; Cloud-e-Genome project; dataflow model restrictions; exome sequencing pipeline; scripted HPC-based NGS pipelines; workflow blocks; workflow enactment system; Bioinformatics; Cloud computing; Computational modeling; Genomics; Libraries; Pipelines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on
  • Conference_Location
    Chicago, IL
  • Type

    conf

  • DOI
    10.1109/CCGrid.2014.128
  • Filename
    6846521