Title :
Efficient Versioning for Scientific Array Databases
Author :
Seering, Adam ; Cudre-Mauroux, Philippe ; Madden, Samuel ; Stonebraker, Michael
Author_Institution :
MIT CSAIL, Cambridge, MA, USA
Abstract :
In this paper, we describe a versioned database storage manager we are developing for the SciDB scientific database. The system is designed to efficiently store and retrieve array-oriented data, exposing a "no-overwrite" storage model in which each update creates a new "version" of an array. This makes it possible to perform comparisons of versions produced at different times or by different algorithms, and to create complex chains and trees of versions. We present algorithms to efficiently encode these versions, minimizing storage space while still providing efficient access to the data. Additionally, we present an optimal algorithm that, given a long sequence of versions, determines which versions to encode in terms of each other (using delta compression) to minimize total storage space or query execution cost. We compare the performance of these algorithms on real world data sets from the National Oceanic and Atmospheric Administration (NOAA), Open Street Maps, and several other sources. We show that our algorithms provide better performance than existing version control systems not optimized for array data, both in terms of storage size and access time, and that our delta-compression algorithms are able to substantially reduce the total storage space when versions exist with a high degree of similarity.
Keywords :
configuration management; data compression; database management systems; query processing; scientific information systems; storage management; NOAA; National Oceanic and Atmospheric Administration; SciDB scientific database; access time; array-oriented data; data access; delta compression; delta-compression algorithm; no-overwrite storage model; open street maps; optimal algorithm; query execution cost; scientific array database; storage size; storage space; version control system; versioned database storage manager; Arrays; Data models; Databases; Encoding; Image coding; Layout; Prototypes;
Conference_Titel :
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4673-0042-1
DOI :
10.1109/ICDE.2012.102