DocumentCode
1785075
Title
Storing provenance data of genome project workflows using graph database
Author
Pinheiro, Rodrigo ; Aires, Bruno ; Araujo, Aleteia F. ; Holanda, Maristela ; Walter, Maria Emilia ; Lifschitz, Sergio
Author_Institution
Comput. Sci. Dept., Univ. of Brasilia, Brasilia, Brazil
fYear
2014
fDate
2-5 Nov. 2014
Firstpage
16
Lastpage
22
Abstract
Many scientific experiments are designed as computational workflows in bioinformatics. However, the amount of data generated increases at every phase of each execution, hindering the identification of the source and the transformation of data. Therefore, it has become necessary to create new tools to store data provenance, mainly which resources and parameters were used to generate the results, among other information, to validate and publish the experiment. In this paper, we propose to use graph database to store data provenance using the PROV-DM model of bioinformatics workflows. To validate the model, we developed a simulator that worked as a logbook to capture data provenance. A workflow with real genomic data showed that very little additional data should be stored, which means that our provenance model can be easily included in genome projects.
Keywords
bioinformatics; data handling; data models; database management systems; genomics; storage management; PROV-DM model; bioinformatics workflows; genome project workflows; genomic data; graph database; logbook; provenance data storage; simulator; Bioinformatics; Biological system modeling; Data models; Databases; Genomics; Pipelines; PROV-DM; bioinformatics; data provenace; genome projects; graph database; workflow;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location
Belfast
Type
conf
DOI
10.1109/BIBM.2014.6999292
Filename
6999292
Link To Document