DocumentCode :
583221
Title :
A three-dimensional data model in HBase for large time-series dataset analysis
Author :
Han, Dan ; Stroulia, Eleni
Author_Institution :
Dept. of Comput. Sci., Univ. of Alberta, Edmonton, AB, Canada
fYear :
2012
fDate :
24-24 Sept. 2012
Firstpage :
47
Lastpage :
56
Abstract :
In the transition of applications from the traditional enterprise infrastructures to cloud infrastructures, scalable database management system plays an important role in efficiently managing and analysing unprecedented massive amount of data. Compared to RDBMSs, NoSQL databases, are more attractive in addressing this challenge. However, it is not easy to manage data in NoSQL database effectively for non-expert users because of the rare data-organization support. A poor data organization may accidentally abuse the features of NoSQL database and achieve unsatisfactory performance. Therefore, a systematic method for NoSQL database data-schema design is a timely and important problem for researchers and practitioners. HBase, as a particular NoSQL database offering, relies (a) on HDFS, for its distributed and replicated storage, and (b) on coprocessors, for efficient parallel query processing. To harness the potential parallelism benefits, an appropriate partitioning of the data across the HBase storage is required. we investigate the effectiveness of the three-dimensional data model, which uses the “version” dimension of HBase to store the values of a data item over time. We have experimented and evaluated the performance impact of this type of data model with two data sets, of different sizes and different time lengths. For each of these data sets, we have compared the performance of several ad-hoc queries, implemented with HBase Coprocessors framework, across different data schemas, some of which (do not) use the third HBase dimension. The experiment results demonstrate improved performance with the data schemas that use the third dimension of HBase.
Keywords :
coprocessors; data analysis; data models; distributed databases; file organisation; parallel processing; time series; HBase; HBase coprocessor framework; NoSQL database data-schema design; RDBMS; ad-hoc queries; cloud infrastructures; data analysis; data partitioning; data schemas; data-organization support; database management system; distributed storage; enterprise infrastructures; parallel query processing; replicated storage; three-dimensional data model; time-series dataset analysis; Conferences; Coprocessors; Data models; Databases; Maintenance engineering; Measurement; Organizations; Coprocessor; Data Model; Data Schema; HBase; Time-Series Dataset;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA), 2012 IEEE 6th International Workshop on the
Conference_Location :
Trnto
Print_ISBN :
978-1-4673-3002-2
Type :
conf
DOI :
10.1109/MESOCA.2012.6392598
Filename :
6392598
Link To Document :
بازگشت