• DocumentCode
    659439
  • Title

    Scalable data citation in dynamic, large databases: Model and reference implementation

  • Author

    Proll, Stefan ; Rauber, Andreas

  • Author_Institution
    SBA Res., Vienna, Austria
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    307
  • Lastpage
    312
  • Abstract
    Uniquely and precisely identifying and citing arbitrary subsets of data is essential in many settings, e.g. to facilitate experiment validation and data re-use in meta-studies. Current approaches relying on pointers to entire data collections or on explicit copies of data do not scale. We propose a novel approach relying on persistent, timestamped, adapted queries to versioned and timestamped data sources. Result set hashes are used for validation correctness on later re-execution. The proposed method works both for static as well as dynamically growing or changing data. Alternative implementation styles for relational databases are presented and evaluated with regard to performance issues and impact on existing applications while aiming at minimal to no additional effort requirements for data users. The approach is validated in an infrastructure monitoring domain relying on sensor data networks.
  • Keywords
    citation analysis; query processing; relational databases; adapted queries; data citation; data sources; dynamic database; infrastructure monitoring domain; large database; model implementation; persistent queries; reference implementation; relational databases; sensor data network; timestamped queries; Data models; History; Monitoring; Relational databases; Sorting; Standards;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691588
  • Filename
    6691588