• DocumentCode
    3647446
  • Title

    Multicluster Hadoop Distributed File System

  • Author

    I. Tomašić;J. Ugovšek;A. Rashkovska;R. Trobec

  • Author_Institution
    Jož
  • fYear
    2012
  • fDate
    5/1/2012 12:00:00 AM
  • Firstpage
    301
  • Lastpage
    305
  • Abstract
    The Hadoop Distributed File System (HDFS) is one of the important subprojects of the Apache Hadoop project that allows the distributed processing and fast access to large data sets on distributed storage platforms. The HDFS is normally installed on a cluster of computers. When the cluster becomes undersized, one commonly used possibility is to scale the cluster by adding new computers and storage devices. Another possibility, not exploited so far, is to resort for resources on another computer cluster. In this paper we present a multicluster HDFS installation extended across two clusters, with different operating systems, connected over the Internet. The specific networking parameters and HDFS configuration parameters, needed for a multicluster installation, are presented. We have benchmarked a single and dual cluster installation with the same networking and configuration parameters. The benchmark results indicate that multicluster HDFS provide increased storage area, however, the data manipulation speed is limited by the bandwidth of communication channel that connects both clusters.
  • Keywords
    "Bandwidth","File systems","Benchmark testing","Computers","Cloud computing","Operating systems"
  • Publisher
    ieee
  • Conference_Titel
    MIPRO, 2012 Proceedings of the 35th International Convention
  • Print_ISBN
    978-1-4673-2577-6
  • Type

    conf

  • Filename
    6240660