DocumentCode
659439
Title
Scalable data citation in dynamic, large databases: Model and reference implementation
Author
Proll, Stefan ; Rauber, Andreas
Author_Institution
SBA Res., Vienna, Austria
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
307
Lastpage
312
Abstract
Uniquely and precisely identifying and citing arbitrary subsets of data is essential in many settings, e.g. to facilitate experiment validation and data re-use in meta-studies. Current approaches relying on pointers to entire data collections or on explicit copies of data do not scale. We propose a novel approach relying on persistent, timestamped, adapted queries to versioned and timestamped data sources. Result set hashes are used for validation correctness on later re-execution. The proposed method works both for static as well as dynamically growing or changing data. Alternative implementation styles for relational databases are presented and evaluated with regard to performance issues and impact on existing applications while aiming at minimal to no additional effort requirements for data users. The approach is validated in an infrastructure monitoring domain relying on sensor data networks.
Keywords
citation analysis; query processing; relational databases; adapted queries; data citation; data sources; dynamic database; infrastructure monitoring domain; large database; model implementation; persistent queries; reference implementation; relational databases; sensor data network; timestamped queries; Data models; History; Monitoring; Relational databases; Sorting; Standards;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691588
Filename
6691588
Link To Document