• DocumentCode
    694405
  • Title

    HView: Multi-dimension view of massive data in Hadoop

  • Author

    Fuhui Wu ; Qingbo Wu ; Yusong Tan

  • Author_Institution
    Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2013
  • fDate
    12-13 Oct. 2013
  • Firstpage
    430
  • Lastpage
    433
  • Abstract
    Hadoop has become an attractive platform to store large-scale data in HDFS and perform analytics using MapReduce framework. However, dataset of multi-field in HDFS is usually stored in just one-dimension. Analytics in Hadoop usually need to process the whole dataset in a brute way. In this paper, we introduce HView, an extension of data layout in HDFS, to store data according to multiple fields. HView provides people with different dimension views of the same dataset in HDFS. HView does not need to modify Hadoop, increase DataNode storage occupy or bring Namenode pressure. We exploit a use case of Map-side join for HView. Experiment result shows that HView can improve the efficiency of Map-side join and solve the problem of size limit in Map-side join.
  • Keywords
    data warehouses; distributed databases; parallel programming; DataNode storage; HDFS; HView; Hadoop; Map-side join efficiency improvement; MapReduce framework; Namenode pressure; data layout extension; dataset dimension views; large-scale data storage; massive data; multidimension view; multifield dataset; perform analytics; size limit problem; Computer science; Data analysis; Educational institutions; File systems; Indexes; Layout; Partitioning algorithms; HView; Hadoop; join; layout; warehouse;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
  • Conference_Location
    Dalian
  • Type

    conf

  • DOI
    10.1109/ICCSNT.2013.6967146
  • Filename
    6967146