• DocumentCode
    2759940
  • Title

    Building the Trident Scientific Workflow Workbench for Data Management in the Cloud

  • Author

    Simmhan, Yogesh ; Barga, Roger ; van Ingen, C. ; Lazowska, Ed ; Szalay, Alex

  • Author_Institution
    Microsoft Res., Redmond, WA, USA
  • fYear
    2009
  • fDate
    11-16 Oct. 2009
  • Firstpage
    41
  • Lastpage
    50
  • Abstract
    Scientific workflows have gained popularity for modeling and executing in silico experiments by scientists for problem-solving. These workflows primarily engage in computation and data transformation tasks to perform scientific analysis in the Science Cloud. Increasingly workflows are gaining use in managing the scientific data when they arrive from external sensors and are prepared for becoming science ready and available for use in the Cloud. While not directly part of the scientific analysis, these workflows operating behind the Cloud on behalf of the -data valets¿ play an important role in end-to-end management of scientific data products. They share several features with traditional scientific workflows: both are data intensive and use Cloud resources. However, they also differ in significant respects, for example, in the reliability required, scheduling constraints and the use of provenance collected. In this article, we investigate these two classes of workflows - Science Application workflows and Data Preparation workflows - and use these to drive common and distinct requirements from workflow systems for eScience in the Cloud. We use workflow examples from two collaborations, the NEPTUNE oceanography project and the Pan-STARRS astronomy project, to draw out our comparison. Our analysis of these workflows classes can guide the evolution of workflow systems to support emerging applications in the Cloud and the Trident Scientific Workbench is one such workflow system that has directly benefitted from this to meet the needs of these two eScience projects.
  • Keywords
    Internet; database management systems; workflow management software; NEPTUNE oceanography project; Pan-STARRS astronomy project; cloud computing; data management; data preparation workflows; eScience; science application workflows; trident scientific workflow workbench; Cloud computing; Computer applications; Conference management; Data analysis; Data engineering; Evolution (biology); Grid computing; Instruments; Resource management; Scheduling; applications in eScience; cloud computing; scientific data management; scientific workflows;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Engineering Computing and Applications in Sciences, 2009. ADVCOMP '09. Third International Conference on
  • Conference_Location
    Sliema
  • Print_ISBN
    978-1-4244-5082-4
  • Electronic_ISBN
    978-0-7695-3829-7
  • Type

    conf

  • DOI
    10.1109/ADVCOMP.2009.14
  • Filename
    5359629