• DocumentCode
    2262705
  • Title

    ERMS: An Elastic Replication Management System for HDFS

  • Author

    Cheng, Zhendong ; Luan, Zhongzhi ; Meng, You ; Xu, Yijing ; Qian, Depei ; Roy, Alain ; Zhang, Ning ; Guan, Gang

  • Author_Institution
    Sino-German Joint Software Inst., Beihang Univ., Beijing, China
  • fYear
    2012
  • fDate
    24-28 Sept. 2012
  • Firstpage
    32
  • Lastpage
    40
  • Abstract
    The Hadoop Distributed File System (HDFS) is a distributed storage system that stores large-scale data sets reliably and streams those data sets to applications at high bandwidth. HDFS provides high performance, reliability and availability by replicating data, typically three copies of every data. The data in HDFS changes in popularity over time. To get better performance and higher disk utilization, the replication policy of HDFS should be elastic and adapt to data popularity. In this paper, we describe ERMS, an elastic replication management system for HDFS. ERMS provides an active/standby storage model for HDFS. It utilizes a complex event processing engine to distinguish real-time data types, and then dynamically increases extra replicas for hot data, cleans up these extra replicas when the data cool down, and uses erasure codes for cold data. ERMS also introduces a replica placement strategy for the extra replicas of hot data and erasure coding parities. The experiments show that ERMS effectively improves the reliability and performance of HDFS and reduce storage overhead.
  • Keywords
    disc storage; real-time systems; replicated databases; storage management; ERMS; HDFS; active/standby storage model; complex event processing engine; data popularity; data set streaming; disk utilization; distributed storage system; elastic replication management system; erasure coding parities; hadoop distributed file system; large-scale data set; real-time data type; reliability; replica placement strategy; replicating data; replication policy; storage overhead; Availability; Bandwidth; Cloud computing; Encoding; Real-time systems; Throughput; Elastic; HDFS; Replication Management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4673-2893-7
  • Type

    conf

  • DOI
    10.1109/ClusterW.2012.25
  • Filename
    6355844