• DocumentCode
    599146
  • Title

    Managing data provenance in genome project workflows

  • Author

    De Paula, Ramon ; Holanda, Maristela T. ; Walter, Maria Emilia M. T. ; Lifschitz, Sergio

  • Author_Institution
    Comput. Sci. Dept., Univ. of Brasilia (UnB), Brasilia, Brazil
  • fYear
    2012
  • fDate
    4-7 Oct. 2012
  • Firstpage
    654
  • Lastpage
    661
  • Abstract
    In this article, we propose the application of the PROV-DM model to manage data provenance for workflows designed to support genome projects. This provenance model aims at storing details of each execution of the workflow, which include raw and produced data, computational tools and versions, parameters, and so on. This way, biologists can review details of a particular workflow execution, compare information generated among different executions, and plan new ones more efficiently. In addition, we have created a provenance simulator to facilitate the inclusion of a provenance data model in genome projects. In order to validate our proposal, we discuss a case study of an RNA-Seq project that aims to identify, measure and compare RNA expression levels across liver and kidney RNA samples produced by high-throughput automatic sequencers.
  • Keywords
    RNA; biology computing; data models; genomics; kidney; liver; PROV-DM model; RNA expression levels; RNA-Seq project; data provenance management; genome project workflows; high-throughput automatic sequencers; kidney RNA; liver; provenance data model; Bioinformatics; Biological system modeling; DNA; Data models; Databases; Genomics; RNA; PROV-DM; bioinformatics; data provenance; genome project; workflow;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-4673-2746-6
  • Electronic_ISBN
    978-1-4673-2744-2
  • Type

    conf

  • DOI
    10.1109/BIBMW.2012.6470215
  • Filename
    6470215