• DocumentCode
    1785075
  • Title

    Storing provenance data of genome project workflows using graph database

  • Author

    Pinheiro, Rodrigo ; Aires, Bruno ; Araujo, Aleteia F. ; Holanda, Maristela ; Walter, Maria Emilia ; Lifschitz, Sergio

  • Author_Institution
    Comput. Sci. Dept., Univ. of Brasilia, Brasilia, Brazil
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    16
  • Lastpage
    22
  • Abstract
    Many scientific experiments are designed as computational workflows in bioinformatics. However, the amount of data generated increases at every phase of each execution, hindering the identification of the source and the transformation of data. Therefore, it has become necessary to create new tools to store data provenance, mainly which resources and parameters were used to generate the results, among other information, to validate and publish the experiment. In this paper, we propose to use graph database to store data provenance using the PROV-DM model of bioinformatics workflows. To validate the model, we developed a simulator that worked as a logbook to capture data provenance. A workflow with real genomic data showed that very little additional data should be stored, which means that our provenance model can be easily included in genome projects.
  • Keywords
    bioinformatics; data handling; data models; database management systems; genomics; storage management; PROV-DM model; bioinformatics workflows; genome project workflows; genomic data; graph database; logbook; provenance data storage; simulator; Bioinformatics; Biological system modeling; Data models; Databases; Genomics; Pipelines; PROV-DM; bioinformatics; data provenace; genome projects; graph database; workflow;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999292
  • Filename
    6999292