• DocumentCode
    3006895
  • Title

    Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotations

  • Author

    Alper, Pinar ; Belhajjame, Khalid ; Goble, Carole ; Karagoz, Pinar

  • Author_Institution
    Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
  • fYear
    2013
  • fDate
    June 27 2013-July 2 2013
  • Firstpage
    318
  • Lastpage
    325
  • Abstract
    Scientific workflows have become the workhorse of Big Data analytics for scientists. As well as being repeatable and optimizable pipelines that bring together datasets and analysis tools, workflows make-up an important part of the provenance of data generated from their execution. By faithfully capturing all stages in the analysis, workflows play a critical part in building up the audit-trail (a.k.a. provenance) meta-data for derived datasets and contributes to the veracity of results. Provenance is essential for reporting results, reporting the method followed, and adapting to changes in the datasets or tools. These functions, however, are hampered by the complexity of workflows and consequently the complexity of data-trails generated from their instrumented execution. In this paper we propose the generation of workflow description summaries in order to tackle workflow complexity. We elaborate reduction primitives for summarizing workflows, and show how primitives, as building blocks, can be used in conjunction with semantic workflow annotations to encode different summarization strategies. We report on the effectiveness of the method through experimental evaluation using real-world workflows from the Tavern a system.
  • Keywords
    meta data; natural sciences computing; workflow management software; Tavern; audit-trail meta-data; big data analytics; data provenance; data-trail complexity; scientific workflow summarization; semantic workflow annotations; workflow complexity; workflow description summary generation; Complexity theory; Libraries; Ontologies; Organizations; Pipelines; Ports (Computers); Semantics; Annotation; Motif; Rule-Based Summarization; Scientific Workflow;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2013 IEEE International Congress on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5006-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2013.49
  • Filename
    6597153