DocumentCode :
652241
Title :
SciHive: Array-Based Query Processing with HiveQL
Author :
Yifeng Geng ; Xiaomeng Huang ; Meiqi Zhu ; Huabin Ruan ; Guangwen Yang
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2013
fDate :
16-18 July 2013
Firstpage :
887
Lastpage :
894
Abstract :
The data-intensive scientific discoveries are generating huge amounts of data at an alarming rate. Most of the data are multidimensional and stored in array-based file formats. The processing of such big data becomes an urgent challenge. In this paper, we present SciHive, a scalable and easy-to-use array-based query system. SciHive enables scientists to process raw array datasets in parallel with a SQL-like query language. We implement SciHive as an extension of Hive which is a data warehouse system on Hadoop. SciHive maps the arrays in NetCDF files to a table and executes the queries via MapReduce. Files are loaded dynamically as needed. So SciHive does not need any additional pre-loading or format conversion procedure. In addition, SciHive includes two optimization methods to reduce the generated rows. Experiments with different queries on representative datasets show that the optimizations are very effective in most cases and SciHive is scalable to handle large datasets.
Keywords :
SQL; data mining; data warehouses; file organisation; parallel processing; public domain software; query languages; query processing; Hadoop; HiveQL; MapReduce; NetCDF files; SQL-like query language; SciHive; alarming rate; array-based file formats; array-based query processing; big data processing; data warehouse system; data-intensive scientific discovery; easy-to-use array-based query system; format conversion procedure; multidimensional data; optimization methods; raw array dataset process; Arrays; Data models; Database languages; Indexes; Information management; Libraries; Optimization; Hive; MapReduce; array data; big data; query optimization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Type :
conf
DOI :
10.1109/TrustCom.2013.108
Filename :
6680928
Link To Document :
بازگشت