• DocumentCode
    2301268
  • Title

    Multi-file queries performance improvement through data placement in Hadoop

  • Author

    Yu Tang ; Abdulhay, E. ; Aihua Fan ; Sheng Su ; Gebreselassie, K.

  • Author_Institution
    Univ. of Electron. Sci. & Technol. of China, Chengdu, China
  • fYear
    2012
  • fDate
    29-31 Dec. 2012
  • Firstpage
    986
  • Lastpage
    991
  • Abstract
    Hadoop is enjoying popularity for processing data-intensive jobs because of its data locality feature. However, the performance gained from Hadoop´s above feature is currently limited by its default block placement policy, which implicitly assumes instances of MapReduce jobs access data from a single file. On the contrary, multi-file queries like indexing query or aggregation query need to process related data from more than one files found on different DataNodes of a cluster. In this paper we proposed a Correlation-based Block Placement (CBP) Algorithm that enhances the performance of these queries by placing related blocks on the same set of DataNodes. Furthermore, we developed a customized InputFormat that enables InputSplits contain records from different files. Simulation results demonstrated that the number of migrating data blocks for CBP was insignificant. On the contrary, for default policy case, the number of migrating data blocks increased significantly with the input dataset size. As a result, for any input dataset size, the performance of CBP exceeded that of the default policy.
  • Keywords
    distributed processing; query processing; CBP; Hadoop; aggregation query; block placement policy; correlation-based block placement algorithm; data locality feature; data placement; data-intensive job processing; indexing query; multifile queries performance improvement; multifile query; Block Placement; Correlation; Data locality; HDFS;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on
  • Conference_Location
    Changchun
  • Print_ISBN
    978-1-4673-2963-7
  • Type

    conf

  • DOI
    10.1109/ICCSNT.2012.6526092
  • Filename
    6526092