DocumentCode :
1785075
Title :
Storing provenance data of genome project workflows using graph database
Author :
Pinheiro, Rodrigo ; Aires, Bruno ; Araujo, Aleteia F. ; Holanda, Maristela ; Walter, Maria Emilia ; Lifschitz, Sergio
Author_Institution :
Comput. Sci. Dept., Univ. of Brasilia, Brasilia, Brazil
fYear :
2014
fDate :
2-5 Nov. 2014
Firstpage :
16
Lastpage :
22
Abstract :
Many scientific experiments are designed as computational workflows in bioinformatics. However, the amount of data generated increases at every phase of each execution, hindering the identification of the source and the transformation of data. Therefore, it has become necessary to create new tools to store data provenance, mainly which resources and parameters were used to generate the results, among other information, to validate and publish the experiment. In this paper, we propose to use graph database to store data provenance using the PROV-DM model of bioinformatics workflows. To validate the model, we developed a simulator that worked as a logbook to capture data provenance. A workflow with real genomic data showed that very little additional data should be stored, which means that our provenance model can be easily included in genome projects.
Keywords :
bioinformatics; data handling; data models; database management systems; genomics; storage management; PROV-DM model; bioinformatics workflows; genome project workflows; genomic data; graph database; logbook; provenance data storage; simulator; Bioinformatics; Biological system modeling; Data models; Databases; Genomics; Pipelines; PROV-DM; bioinformatics; data provenace; genome projects; graph database; workflow;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
Type :
conf
DOI :
10.1109/BIBM.2014.6999292
Filename :
6999292
Link To Document :
بازگشت