DocumentCode
3769968
Title
Capturing provenance for big data analytics done using SQL interface
Author
Anu Mary Chacko;Ajeeb M Basheer;S D Madhu Kumar
Author_Institution
Department of Computer Science and Engineering, National Institute of Technology, Calicut, India 673601
fYear
2015
Firstpage
1
Lastpage
6
Abstract
In this era of data explosion, big data research is gaining much importance. We have a collection of new technologies for data management in big data like NoSQL databases, (e.g. MongoDB, Cassandra), analytic tools (e.g. MapReduce, Hive) etc. These tools do not have a SQL query interface which users are very familiar with. So with Postgres 9.1 designers have been given an option of foreign data wrappers to interface Postgres with data stored in other data stores which may or may not be relational. Using foreign data wrappers we can link data in external data stores to Postgres interface and analyze the data residing in the datastore using SQL queries. Provenance is a metadata which captures the relation between input data and output result. This is very useful in debugging output result. PERM is a tool developed as an extension to Postgres 8.3 to make Postgres provenance aware. In this paper we present an extension of tool PERM to capture provenance for data accessed from external data stores through foreign data wrappers. The tool PERM implements PERM Influence Contribution Semantics. We propose extension to the current contribution semantics used by PERM, to capture `when´ and `who´ provenance which is important in the context of Big Data Analytics. We ported PERM to Postgres 9.3 and added new modules for capturing `when provenance´. The implementation was verified by writing Foreign data wrapper for MongoDB and performance was evaluated by writing queries for the same.
Keywords
"Semantics","Big data","Context","Relational databases","Standards","Computers"
Publisher
ieee
Conference_Titel
Electrical Computer and Electronics (UPCON), 2015 IEEE UP Section Conference on
Type
conf
DOI
10.1109/UPCON.2015.7456749
Filename
7456749
Link To Document