• DocumentCode
    1626902
  • Title

    Efficient prefetching technique for storage of heterogeneous small files in Hadoop Distributed File System Federation

  • Author

    Aishwarya, K. ; Arvind Ram, A. ; Sreevatson, M.C. ; Babu, Chitra ; Prabavathy, B.

  • fYear
    2013
  • Firstpage
    523
  • Lastpage
    530
  • Abstract
    Hadoop Distributed File System Federation [5] is used to store and manage large files. This has been used in a university scenario to store various categories of files such as PDFs, audio, video, presentation and image files. However, HDFS Federation suffers performance penalty while storing a large number of small files. Also, scaling the namenodes in HDFS Federation does not solve the small files problem [7] but only delays the metadata accumulation. One approach to handle this problem was implemented in BlueSky [1], one of the most revalent e-learning resources in China. However, this system does not handle files from heterogeneous users and the prefetching mechanism implemented in this system takes into account only the locality of reference and does not consider file access patterns. The objective of this paper is to address the above mentioned shortcomings by developing an efficient approach to handle files from heterogeneous users and to devise an efficient prefetching algorithm based on file access patterns. The file access patterns are stored and updated in a priority heap. Heterogeneous users can upload their files and complete transparency is maintained in grouping small files into a large file. This approach of merging several small files into a large file reduces the memory footprint in Federated HDFS. In addition to the existing features, this paper also provides options to modify and delete the files stored by users in Federated HDFS. Performance of original HDFS Federation and the proposed system are benchmarked with a set of 100,000 small files. The experimental results show that the memory usage was reduced by 36% from original HDFS Federation. File read time has been brought down by 94% (with prefetching based on files access patterns) compared to the proposed system without prefetching and 92% compared to prefetching based on the locality of reference.
  • Keywords
    distributed databases; meta data; network operating systems; storage management; BlueSky; China; HDFS federation; Hadoop distributed file system federation; e-Iearning resources; file access patterns; file access prefetching; file read time; heterogeneous small file storage; heterogeneous users; memory footprint; metadata accumulation; prefetching mechanism; prefetching technique; Delays; Educational institutions; Heart beat; Indexes; Lead; Merging; Prefetching; HDFS Federation; files access pattern; metadata; prefetching; small files problem;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computing (ICoAC), 2013 Fifth International Conference on
  • Conference_Location
    Chennai
  • Print_ISBN
    978-1-4799-3447-8
  • Type

    conf

  • DOI
    10.1109/ICoAC.2013.6922006
  • Filename
    6922006