DocumentCode
694405
Title
HView: Multi-dimension view of massive data in Hadoop
Author
Fuhui Wu ; Qingbo Wu ; Yusong Tan
Author_Institution
Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
fYear
2013
fDate
12-13 Oct. 2013
Firstpage
430
Lastpage
433
Abstract
Hadoop has become an attractive platform to store large-scale data in HDFS and perform analytics using MapReduce framework. However, dataset of multi-field in HDFS is usually stored in just one-dimension. Analytics in Hadoop usually need to process the whole dataset in a brute way. In this paper, we introduce HView, an extension of data layout in HDFS, to store data according to multiple fields. HView provides people with different dimension views of the same dataset in HDFS. HView does not need to modify Hadoop, increase DataNode storage occupy or bring Namenode pressure. We exploit a use case of Map-side join for HView. Experiment result shows that HView can improve the efficiency of Map-side join and solve the problem of size limit in Map-side join.
Keywords
data warehouses; distributed databases; parallel programming; DataNode storage; HDFS; HView; Hadoop; Map-side join efficiency improvement; MapReduce framework; Namenode pressure; data layout extension; dataset dimension views; large-scale data storage; massive data; multidimension view; multifield dataset; perform analytics; size limit problem; Computer science; Data analysis; Educational institutions; File systems; Indexes; Layout; Partitioning algorithms; HView; Hadoop; join; layout; warehouse;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
Conference_Location
Dalian
Type
conf
DOI
10.1109/ICCSNT.2013.6967146
Filename
6967146
Link To Document