Title :
HView: Multi-dimension view of massive data in Hadoop
Author :
Fuhui Wu ; Qingbo Wu ; Yusong Tan
Author_Institution :
Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Hadoop has become an attractive platform to store large-scale data in HDFS and perform analytics using MapReduce framework. However, dataset of multi-field in HDFS is usually stored in just one-dimension. Analytics in Hadoop usually need to process the whole dataset in a brute way. In this paper, we introduce HView, an extension of data layout in HDFS, to store data according to multiple fields. HView provides people with different dimension views of the same dataset in HDFS. HView does not need to modify Hadoop, increase DataNode storage occupy or bring Namenode pressure. We exploit a use case of Map-side join for HView. Experiment result shows that HView can improve the efficiency of Map-side join and solve the problem of size limit in Map-side join.
Keywords :
data warehouses; distributed databases; parallel programming; DataNode storage; HDFS; HView; Hadoop; Map-side join efficiency improvement; MapReduce framework; Namenode pressure; data layout extension; dataset dimension views; large-scale data storage; massive data; multidimension view; multifield dataset; perform analytics; size limit problem; Computer science; Data analysis; Educational institutions; File systems; Indexes; Layout; Partitioning algorithms; HView; Hadoop; join; layout; warehouse;
Conference_Titel :
Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
Conference_Location :
Dalian
DOI :
10.1109/ICCSNT.2013.6967146