Title :
HGrid: A Data Model for Large Geospatial Data Sets in HBase
Author :
Dan Han ; Stroulia, Eleni
Author_Institution :
Dept. of Comput. Sci., Univ. of Alberta, Edmonton, AB, Canada
fDate :
June 28 2013-July 3 2013
Abstract :
Cloud-based infrastructures enable applications to collect and analyze massive amounts of data. Whether these applications are newly developed or they are being evolved from existing RDBMS-based implementations, NoSQL databases offer an attractive platform with which to address this challenge. However, developers find it difficult to effectively manage data in NoSQL databases, because these platforms do not offer much support for data organization. Since poor data organization may abuse the features of the NoSQL database and result in unsatisfactory performance, developing a systematic method for NoSQL database data-schema design is a timely and important problem. In this paper, we focus on geospatial applications, as a family of big-data systems with distinct data types and usage patterns, in need of scalability. We propose the HGrid data model for HBase, based on a hybrid index structure, combining a quad-tree and a regular grid as primary and secondary indices correspondingly. We have comparatively evaluated the performance of HGrid with uniform and skewed data, against two other data models based on quad-tree and regular-grid indices. Our results demonstrate that HGrid scales well and supports efficient performance for range and k-nearest neighbor queries. Although this model does not outperform all its competitors in terms of query response time, it is more flexible for discontinuous and skewed space, and its index requires less space than the corresponding quad-tree and regular-grid indices, which makes its deployment possible with less resources. Through this study, we also formulate a set of guidelines on how to organize data for geospatial applications in HBase.
Keywords :
Big Data; cloud computing; data analysis; data models; database indexing; grid computing; quadtrees; query processing; relational databases; visual databases; HBase; HGrid data model; NoSQL database data-schema design; RDBMS-based implementation; big-data systems; cloud-based infrastructures; data analysis; data collection; data management; data organization; data types; geospatial application; hybrid index structure; k-nearest neighbor queries; large geospatial data sets; quad-tree; query response time; regular grid; skewed data; uniform data; usage patterns; Coprocessors; Data models; Geospatial analysis; Indexes; Organizations; Tiles; Coprocessor; Data Model; Data Schema; Geospatial Data Set; HBase;
Conference_Titel :
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5028-2
DOI :
10.1109/CLOUD.2013.78