DocumentCode :
2079301
Title :
Hive - a petabyte scale data warehouse using Hadoop
Author :
Thusoo, Ashish ; Sarma, Joydeep Sen ; Jain, Namit ; Shao, Zheng ; Chakka, Prasad ; Zhang, Ning ; Antony, Suresh ; Liu, Hao ; Murthy, Raghotham
Author_Institution :
Facebook Data Infrastruct. Team, CA, USA
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
996
Lastpage :
1005
Abstract :
The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop is a popular open-source map-reduce implementation which is being used in companies like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. In this paper, we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoop. In addition, HiveQL enables users to plug in custom map-reduce scripts into queries. The language includes a type system with support for tables containing primitive types, collections like arrays and maps, and nested compositions of the same. The underlying IO libraries can be extended to query data in custom formats. Hive also includes a system catalog - Metastore - that contains schemas and statistics, which are useful in data exploration, query optimization and query compilation. In Facebook, the Hive warehouse contains tens of thousands of tables and stores over 700TB of data and is being used extensively for both reporting and ad-hoc analyses by more than 200 users per month.
Keywords :
SQL; competitive intelligence; data warehouses; public domain software; query processing; Hadoop software; HiveQL language; Metastore system catalog; SQL-like declarative language; arrays; business intelligence; data exploration; map-reduce jobs; maps; nested compositions; open-source map-reduce implementation; petabyte scale data warehouse; primitive types; query compilation; query optimization; Companies; Data warehouses; Facebook; Hardware; Libraries; Open source software; Plugs; Query processing; Statistics; Warehousing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
Type :
conf
DOI :
10.1109/ICDE.2010.5447738
Filename :
5447738
Link To Document :
بازگشت