• DocumentCode
    2518745
  • Title

    Design of an exact data deduplication cluster

  • Author

    Kaiser, Jürgen ; Meister, Dirk ; Brinkmann, Andre ; Effert, Sascha

  • Author_Institution
    Johannes Gutenberg-Univ., Mainz, Germany
  • fYear
    2012
  • fDate
    16-20 April 2012
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Data deduplication is an important component of enterprise storage environments. The throughput and capacity limitations of single node solutions have led to the development of clustered deduplication systems. Most implemented clustered inline solutions are trading deduplication ratio versus performance and are willing to miss opportunities to detect redundant data, which a single node system would detect. We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution. The use of locality and load balancing paradigms enables the nodes to minimize information exchange. Therefore, we are able to show that, despite different claims in previous papers, it is possible to combine exact deduplication, small chunk sizes, and scalability within one environment using only a commodity GBit Ethernet interconnect. Additionally, we investigate the throughput and scalability limitations with a special focus on the intra-node communication.
  • Keywords
    business data processing; electronic data interchange; local area networks; pattern clustering; resource allocation; storage management; capacity limitation; chunk size; clustered deduplication system; commodity GBit Ethernet interconnect; deduplication ratio; distributed chunk index; enterprise storage environment; exact data deduplication cluster design; information exchange; inline deduplication cluster; intra-node communication; load balancing; locality; redundant data; scalability limitation; single node solution; throughput; Containers; Fault tolerance; Fault tolerant systems; Home appliances; Indexes; Load management; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on
  • Conference_Location
    San Diego, CA
  • ISSN
    2160-195X
  • Print_ISBN
    978-1-4673-1745-0
  • Electronic_ISBN
    2160-195X
  • Type

    conf

  • DOI
    10.1109/MSST.2012.6232380
  • Filename
    6232380