• DocumentCode
    2870691
  • Title

    Extreme Binning: Scalable, parallel deduplication for chunk-based file backup

  • Author

    Bhagwat, Deepavali ; Eshghi, Kave ; Long, Darrell D E ; Lillibridge, Mark

  • Author_Institution
    Univ. of California, Santa Cruz, CA, USA
  • fYear
    2009
  • fDate
    21-23 Sept. 2009
  • Firstpage
    1
  • Lastpage
    9
  • Abstract
    Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput. We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.
  • Keywords
    data handling; file organisation; parallel processing; replicated databases; security of data; backup nodes; backup operation; backup workloads; chunk lookup; chunk-based file backup; data deduplication; data management task; data streams; extreme binning; file allocation; multinode backup systems; parallel deduplication; scalable deduplication; stateless routing algorithm; Digital images; Electronic mail; Intrusion detection; Laboratories; Milling machines; Robustness; Routing; Space technology; Throughput; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Modeling, Analysis & Simulation of Computer and Telecommunication Systems, 2009. MASCOTS '09. IEEE International Symposium on
  • Conference_Location
    London
  • ISSN
    1526-7539
  • Print_ISBN
    978-1-4244-4927-9
  • Electronic_ISBN
    1526-7539
  • Type

    conf

  • DOI
    10.1109/MASCOT.2009.5366623
  • Filename
    5366623