Title :
The DBMS - your big data sommelier
Author :
Kargin, Yagiz ; Kersten, Martin ; Manegold, Stefan ; Pirk, Holger
Author_Institution :
Database Archit. Group, Centrum Wiskunde & Inf. (CWI), Amsterdam, Netherlands
Abstract :
When addressing the problem of “big” data volume, preparation costs are one of the key challenges: the high costs for loading, aggregating and indexing data leads to a long data-to-insight time. In addition to being a nuisance to the end-user, this latency prevents real-time analytics on “big” data. Fortunately, data often comes in semantic chunks such as files that contain data items that share some characteristics such as acquisition time or location. A data management system that exploits this trait can significantly lower the data preparation costs and the associated data-to-insight time by only investing in the preparation of the relevant chunks. In this paper, we develop such a system as an extension of an existing relational DBMS (MonetDB). To this end, we develop a query processing paradigm and data storage model that are partial-loading aware. The result is a system that can make a 1.2 TB dataset (consisting of 4000 chunks) ready for querying in less than 3 minutes on a single server-class machine while maintaining good query processing performance.
Keywords :
Big Data; data models; data preparation; query processing; relational databases; storage management; Big Data analytics; Big Data sommelier; Big Data volume preparation costs; MonetDB; associated data-to-insight time; data aggregation; data indexing; data items; data loading; data management system; data storage model; partial-loading aware; query processing performance; relational DBMS; semantic chunks; single server-class machine; Loading; Optimization; Query processing; Relational databases; Semantics; Transforms;
Conference_Titel :
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location :
Seoul
DOI :
10.1109/ICDE.2015.7113361